The Policy Maker of the company wants to enable and establish a viable business model to expand the customer base.
The company in the last campaign contacted the customers at random without looking at the available information and we observed that 18% of the customers purchased the packages.
Currently, there are 5 types of packages the company is offering - Basic, Standard, Deluxe, Super Deluxe, King.
One of the ways to expand the customer base is to introduce a new offering of packages. The company is now planning to launch a new product i.e. Wellness Tourism Package.
Explore and visualize the dataset.
Enable and establish a viable business model to expand the customer base
Generate a set of insights and recommendations that will help the business.
Predict which customer is more likely to purchase the newly introduced travel package.
Data Dictionary -
Customer details:
Customer interaction data:
# this will help in making the Python code more structured automatically (good coding practice)
%load_ext nb_black
# silence unnecessary warnings
import warnings
warnings.filterwarnings("ignore")
# Libraries to help with read, manipulation and visualization data
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
# To build sklearn / xgboost model
from sklearn.tree import DecisionTreeClassifier
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import (
BaggingClassifier,
RandomForestClassifier,
RandomForestClassifier,
GradientBoostingClassifier,
AdaBoostClassifier,
StackingClassifier,
)
from xgboost import XGBClassifier
from sklearn import metrics
from sklearn.metrics import confusion_matrix, classification_report
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from sklearn.model_selection import GridSearchCV, train_test_split
from sklearn.linear_model import LogisticRegression
# Removes the limit from the number of displayed columns and rows.
# This is so I can see the entire dataframe when I print it
pd.set_option("display.max_columns", None)
# pd.set_option('display.max_rows', None)
pd.set_option("display.max_rows", 500)
tourism = pd.ExcelFile("Tourism.xlsx")
# see all sheet names
print(
f"There are \033[1m{tourism.sheet_names} \033[m sheet on the excel file, let's check them to see which one has the data we gonna use."
)
There are ['Data Dict', 'Tourism'] sheet on the excel file, let's check them to see which one has the data we gonna use.
tourism.parse("Data Dict").head() # reading a specific sheet to DataFrame
| Unnamed: 0 | Unnamed: 1 | Unnamed: 2 | Unnamed: 3 | |
|---|---|---|---|---|
| 0 | NaN | Data | Variable | Discerption |
| 1 | NaN | Tourism | CustomerID | Unique customer ID |
| 2 | NaN | Tourism | ProdTaken | Whether the customer has purchased a package o... |
| 3 | NaN | Tourism | Age | Age of customer |
| 4 | NaN | Tourism | TypeofContact | How customer was contacted (Company Invited or... |
Data Dict sheeet is only the summarize of the variables and discerption.tourism.parse("Tourism").head()
| CustomerID | ProdTaken | Age | TypeofContact | CityTier | DurationOfPitch | Occupation | Gender | NumberOfPersonVisiting | NumberOfFollowups | ProductPitched | PreferredPropertyStar | MaritalStatus | NumberOfTrips | Passport | PitchSatisfactionScore | OwnCar | NumberOfChildrenVisiting | Designation | MonthlyIncome | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 200000 | 1 | 41.0 | Self Enquiry | 3 | 6.0 | Salaried | Female | 3 | 3.0 | Deluxe | 3.0 | Single | 1.0 | 1 | 2 | 1 | 0.0 | Manager | 20993.0 |
| 1 | 200001 | 0 | 49.0 | Company Invited | 1 | 14.0 | Salaried | Male | 3 | 4.0 | Deluxe | 4.0 | Divorced | 2.0 | 0 | 3 | 1 | 2.0 | Manager | 20130.0 |
| 2 | 200002 | 1 | 37.0 | Self Enquiry | 1 | 8.0 | Free Lancer | Male | 3 | 4.0 | Basic | 3.0 | Single | 7.0 | 1 | 3 | 0 | 0.0 | Executive | 17090.0 |
| 3 | 200003 | 0 | 33.0 | Company Invited | 1 | 9.0 | Salaried | Female | 2 | 3.0 | Basic | 3.0 | Divorced | 2.0 | 1 | 5 | 1 | 1.0 | Executive | 17909.0 |
| 4 | 200004 | 0 | NaN | Self Enquiry | 1 | 8.0 | Small Business | Male | 2 | 3.0 | Basic | 4.0 | Divorced | 1.0 | 0 | 5 | 1 | 0.0 | Executive | 18468.0 |
Tourism sheeet has the informations that we need to build our project.# Load the data into pandas dataframe
tourism = pd.read_excel("Tourism.xlsx", sheet_name=1) # choosing the second sheet
# Coping the data to another variable to avoid any changes to the original data
df = tourism.copy()
# Understanding the shape of the data
print(
f"There are \033[1;4m{df.shape[0]}\033[m rows and \033[1;4m{df.shape[1]}\033[m columns."
)
# Look at 15 random rows
# Setting the random seed via np.random.seed to get the same random results every time
np.random.seed(1)
df.sample(n=15)
There are 4888 rows and 20 columns.
| CustomerID | ProdTaken | Age | TypeofContact | CityTier | DurationOfPitch | Occupation | Gender | NumberOfPersonVisiting | NumberOfFollowups | ProductPitched | PreferredPropertyStar | MaritalStatus | NumberOfTrips | Passport | PitchSatisfactionScore | OwnCar | NumberOfChildrenVisiting | Designation | MonthlyIncome | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 3015 | 203015 | 0 | 27.0 | Company Invited | 1 | 7.0 | Salaried | Female | 4 | 6.0 | Basic | 3.0 | Married | 5.0 | 0 | 4 | 1 | 3.0 | Executive | 23042.0 |
| 1242 | 201242 | 0 | 40.0 | Self Enquiry | 3 | 13.0 | Small Business | Male | 2 | 3.0 | King | 4.0 | Single | 2.0 | 0 | 4 | 1 | NaN | VP | 34833.0 |
| 3073 | 203073 | 0 | 29.0 | Self Enquiry | 2 | 15.0 | Small Business | Male | 4 | 5.0 | Basic | 3.0 | Married | 3.0 | 0 | 2 | 0 | 2.0 | Executive | 23614.0 |
| 804 | 200804 | 0 | 48.0 | Company Invited | 1 | 6.0 | Small Business | Male | 2 | 1.0 | Super Deluxe | 3.0 | Single | 3.0 | 0 | 2 | 0 | 0.0 | AVP | 31885.0 |
| 3339 | 203339 | 0 | 32.0 | Self Enquiry | 1 | 18.0 | Small Business | Male | 4 | 4.0 | Deluxe | 5.0 | Divorced | 3.0 | 1 | 2 | 0 | 3.0 | Manager | 25511.0 |
| 3080 | 203080 | 1 | 36.0 | Company Invited | 1 | 32.0 | Salaried | Female | 4 | 4.0 | Basic | 4.0 | Married | 3.0 | 1 | 3 | 0 | 1.0 | Executive | 20700.0 |
| 2851 | 202851 | 0 | 46.0 | Self Enquiry | 1 | 17.0 | Salaried | Male | 4 | 4.0 | Basic | 3.0 | Divorced | 5.0 | 0 | 5 | 1 | 1.0 | Executive | 21332.0 |
| 2883 | 202883 | 1 | 32.0 | Company Invited | 1 | 27.0 | Salaried | Male | 4 | 4.0 | Standard | 3.0 | Divorced | 5.0 | 0 | 3 | 1 | 1.0 | Senior Manager | 28502.0 |
| 1676 | 201676 | 0 | 22.0 | Self Enquiry | 1 | 11.0 | Salaried | Male | 2 | 1.0 | Basic | 4.0 | Married | 2.0 | 1 | 4 | 1 | 0.0 | Executive | 17328.0 |
| 1140 | 201140 | 0 | 44.0 | Self Enquiry | 1 | 13.0 | Small Business | Female | 2 | 3.0 | King | 3.0 | Married | 1.0 | 1 | 4 | 1 | 1.0 | VP | 34049.0 |
| 748 | 200748 | 1 | 26.0 | Company Invited | 3 | 35.0 | Small Business | Male | 3 | NaN | Deluxe | 5.0 | Single | 1.0 | 0 | 3 | 0 | 0.0 | Manager | 19969.0 |
| 2394 | 202394 | 1 | NaN | Company Invited | 1 | 8.0 | Salaried | Female | 2 | 4.0 | Basic | 5.0 | Single | 3.0 | 1 | 3 | 0 | 0.0 | Executive | 18506.0 |
| 4881 | 204881 | 1 | 41.0 | Self Enquiry | 2 | 25.0 | Salaried | Male | 3 | 2.0 | Basic | 5.0 | Married | 2.0 | 0 | 1 | 1 | 2.0 | Executive | 21065.0 |
| 3415 | 203415 | 0 | 52.0 | Self Enquiry | 1 | 18.0 | Large Business | Female | 3 | 5.0 | Super Deluxe | 4.0 | Single | 5.0 | 0 | 1 | 0 | 2.0 | AVP | 31820.0 |
| 2253 | 202253 | 0 | NaN | Self Enquiry | 1 | 13.0 | Large Business | Female | 2 | 1.0 | Basic | 3.0 | Married | 2.0 | 0 | 3 | 0 | 0.0 | Executive | 18376.0 |
ProdTaken: is our target variable.
Some columns like Age and NumberOfChildrenVisiting has some NaN values, we gonna treat it on our Missing Values steeps.
Age seems to be float, we gonna convert it to int.
# Analyzing the % that target variable is distributed on data set.
df["ProdTaken"].value_counts() / len(df["ProdTaken"])
0 0.811784 1 0.188216 Name: ProdTaken, dtype: float64
We have an imbalanced data set, with 81.2% of customers that didn't buy a travel package and only 18.8% that bought it.
# Checking the data types of the columns for the dataset
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 4888 entries, 0 to 4887 Data columns (total 20 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 CustomerID 4888 non-null int64 1 ProdTaken 4888 non-null int64 2 Age 4662 non-null float64 3 TypeofContact 4863 non-null object 4 CityTier 4888 non-null int64 5 DurationOfPitch 4637 non-null float64 6 Occupation 4888 non-null object 7 Gender 4888 non-null object 8 NumberOfPersonVisiting 4888 non-null int64 9 NumberOfFollowups 4843 non-null float64 10 ProductPitched 4888 non-null object 11 PreferredPropertyStar 4862 non-null float64 12 MaritalStatus 4888 non-null object 13 NumberOfTrips 4748 non-null float64 14 Passport 4888 non-null int64 15 PitchSatisfactionScore 4888 non-null int64 16 OwnCar 4888 non-null int64 17 NumberOfChildrenVisiting 4822 non-null float64 18 Designation 4888 non-null object 19 MonthlyIncome 4655 non-null float64 dtypes: float64(7), int64(7), object(6) memory usage: 763.9+ KB
# Converting all columns with data type 'object' to 'category'
cat_col = df.select_dtypes(include=["object"]).columns.tolist()
df[cat_col] = df[cat_col].astype("category")
# Checking the data types of the columns for the dataset
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 4888 entries, 0 to 4887 Data columns (total 20 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 CustomerID 4888 non-null int64 1 ProdTaken 4888 non-null int64 2 Age 4662 non-null float64 3 TypeofContact 4863 non-null category 4 CityTier 4888 non-null int64 5 DurationOfPitch 4637 non-null float64 6 Occupation 4888 non-null category 7 Gender 4888 non-null category 8 NumberOfPersonVisiting 4888 non-null int64 9 NumberOfFollowups 4843 non-null float64 10 ProductPitched 4888 non-null category 11 PreferredPropertyStar 4862 non-null float64 12 MaritalStatus 4888 non-null category 13 NumberOfTrips 4748 non-null float64 14 Passport 4888 non-null int64 15 PitchSatisfactionScore 4888 non-null int64 16 OwnCar 4888 non-null int64 17 NumberOfChildrenVisiting 4822 non-null float64 18 Designation 4888 non-null category 19 MonthlyIncome 4655 non-null float64 dtypes: category(6), float64(7), int64(7) memory usage: 564.4 KB
# looking at which columns have the most missing values
df.isnull().sum().sort_values(ascending=False)
DurationOfPitch 251 MonthlyIncome 233 Age 226 NumberOfTrips 140 NumberOfChildrenVisiting 66 NumberOfFollowups 45 PreferredPropertyStar 26 TypeofContact 25 Passport 0 MaritalStatus 0 ProductPitched 0 Designation 0 NumberOfPersonVisiting 0 Gender 0 Occupation 0 PitchSatisfactionScore 0 CityTier 0 OwnCar 0 ProdTaken 0 CustomerID 0 dtype: int64
df.describe().T
| count | mean | std | min | 25% | 50% | 75% | max | |
|---|---|---|---|---|---|---|---|---|
| CustomerID | 4888.0 | 202443.500000 | 1411.188388 | 200000.0 | 201221.75 | 202443.5 | 203665.25 | 204887.0 |
| ProdTaken | 4888.0 | 0.188216 | 0.390925 | 0.0 | 0.00 | 0.0 | 0.00 | 1.0 |
| Age | 4662.0 | 37.622265 | 9.316387 | 18.0 | 31.00 | 36.0 | 44.00 | 61.0 |
| CityTier | 4888.0 | 1.654255 | 0.916583 | 1.0 | 1.00 | 1.0 | 3.00 | 3.0 |
| DurationOfPitch | 4637.0 | 15.490835 | 8.519643 | 5.0 | 9.00 | 13.0 | 20.00 | 127.0 |
| NumberOfPersonVisiting | 4888.0 | 2.905074 | 0.724891 | 1.0 | 2.00 | 3.0 | 3.00 | 5.0 |
| NumberOfFollowups | 4843.0 | 3.708445 | 1.002509 | 1.0 | 3.00 | 4.0 | 4.00 | 6.0 |
| PreferredPropertyStar | 4862.0 | 3.581037 | 0.798009 | 3.0 | 3.00 | 3.0 | 4.00 | 5.0 |
| NumberOfTrips | 4748.0 | 3.236521 | 1.849019 | 1.0 | 2.00 | 3.0 | 4.00 | 22.0 |
| Passport | 4888.0 | 0.290917 | 0.454232 | 0.0 | 0.00 | 0.0 | 1.00 | 1.0 |
| PitchSatisfactionScore | 4888.0 | 3.078151 | 1.365792 | 1.0 | 2.00 | 3.0 | 4.00 | 5.0 |
| OwnCar | 4888.0 | 0.620295 | 0.485363 | 0.0 | 0.00 | 1.0 | 1.00 | 1.0 |
| NumberOfChildrenVisiting | 4822.0 | 1.187267 | 0.857861 | 0.0 | 1.00 | 1.0 | 2.00 | 3.0 |
| MonthlyIncome | 4655.0 | 23619.853491 | 5380.698361 | 1000.0 | 20346.00 | 22347.0 | 25571.00 | 98678.0 |
PreferredPropertyStar, PitchSatisfactionScore are Ordinal Categorical we gonna keep it as numerical and procide with Label Encoding considering that there is a sense of order on the values.
Passaport, OwnCarare Binary Categorical, we gonna keep it as numerical and procide with Label Encoding.
df.describe(include="category").T
| count | unique | top | freq | |
|---|---|---|---|---|
| TypeofContact | 4863 | 2 | Self Enquiry | 3444 |
| Occupation | 4888 | 4 | Salaried | 2368 |
| Gender | 4888 | 3 | Male | 2916 |
| ProductPitched | 4888 | 5 | Basic | 1842 |
| MaritalStatus | 4888 | 4 | Married | 2340 |
| Designation | 4888 | 5 | Executive | 1842 |
for i in cat_col:
print(f"Unique values in \033[1m{i}\033[m are:")
print(df[i].value_counts())
print("\n", "*" * 50, "\n")
Unique values in TypeofContact are: Self Enquiry 3444 Company Invited 1419 Name: TypeofContact, dtype: int64 ************************************************** Unique values in Occupation are: Salaried 2368 Small Business 2084 Large Business 434 Free Lancer 2 Name: Occupation, dtype: int64 ************************************************** Unique values in Gender are: Male 2916 Female 1817 Fe Male 155 Name: Gender, dtype: int64 ************************************************** Unique values in ProductPitched are: Basic 1842 Deluxe 1732 Standard 742 Super Deluxe 342 King 230 Name: ProductPitched, dtype: int64 ************************************************** Unique values in MaritalStatus are: Married 2340 Divorced 950 Single 916 Unmarried 682 Name: MaritalStatus, dtype: int64 ************************************************** Unique values in Designation are: Executive 1842 Manager 1732 Senior Manager 742 AVP 342 VP 230 Name: Designation, dtype: int64 **************************************************
# Fixing Fe Male subcategory on Gender category
df.Gender = df.Gender.apply(lambda x: "Female" if x == "Fe Male" else x)
# Checking gender variable
df["Gender"].value_counts()
Male 2916 Female 1972 Name: Gender, dtype: int64
# converting Gender to Category
df["Gender"] = df["Gender"].astype("category")
# Dropping CustomerID
df = df.drop(["CustomerID"], axis=1)
df.head()
| ProdTaken | Age | TypeofContact | CityTier | DurationOfPitch | Occupation | Gender | NumberOfPersonVisiting | NumberOfFollowups | ProductPitched | PreferredPropertyStar | MaritalStatus | NumberOfTrips | Passport | PitchSatisfactionScore | OwnCar | NumberOfChildrenVisiting | Designation | MonthlyIncome | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 41.0 | Self Enquiry | 3 | 6.0 | Salaried | Female | 3 | 3.0 | Deluxe | 3.0 | Single | 1.0 | 1 | 2 | 1 | 0.0 | Manager | 20993.0 |
| 1 | 0 | 49.0 | Company Invited | 1 | 14.0 | Salaried | Male | 3 | 4.0 | Deluxe | 4.0 | Divorced | 2.0 | 0 | 3 | 1 | 2.0 | Manager | 20130.0 |
| 2 | 1 | 37.0 | Self Enquiry | 1 | 8.0 | Free Lancer | Male | 3 | 4.0 | Basic | 3.0 | Single | 7.0 | 1 | 3 | 0 | 0.0 | Executive | 17090.0 |
| 3 | 0 | 33.0 | Company Invited | 1 | 9.0 | Salaried | Female | 2 | 3.0 | Basic | 3.0 | Divorced | 2.0 | 1 | 5 | 1 | 1.0 | Executive | 17909.0 |
| 4 | 0 | NaN | Self Enquiry | 1 | 8.0 | Small Business | Male | 2 | 3.0 | Basic | 4.0 | Divorced | 1.0 | 0 | 5 | 1 | 0.0 | Executive | 18468.0 |
# Checking Duration of Pitch mean
df["DurationOfPitch"].mean()
15.490834591330602
# Check Duration Of Pitch extreme values
df.sort_values(by=["DurationOfPitch"], ascending=False).head(5)
| ProdTaken | Age | TypeofContact | CityTier | DurationOfPitch | Occupation | Gender | NumberOfPersonVisiting | NumberOfFollowups | ProductPitched | PreferredPropertyStar | MaritalStatus | NumberOfTrips | Passport | PitchSatisfactionScore | OwnCar | NumberOfChildrenVisiting | Designation | MonthlyIncome | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 3878 | 0 | 53.0 | Company Invited | 3 | 127.0 | Salaried | Male | 3 | 4.0 | Basic | 3.0 | Married | 4.0 | 0 | 1 | 1 | 2.0 | Executive | 22160.0 |
| 1434 | 0 | NaN | Company Invited | 3 | 126.0 | Salaried | Male | 2 | 3.0 | Basic | 3.0 | Married | 3.0 | 0 | 1 | 1 | 1.0 | Executive | 18482.0 |
| 2796 | 0 | 49.0 | Self Enquiry | 3 | 36.0 | Small Business | Female | 4 | 4.0 | Standard | 3.0 | Divorced | 5.0 | 0 | 4 | 0 | 1.0 | Senior Manager | 31182.0 |
| 2868 | 0 | 58.0 | Self Enquiry | 3 | 36.0 | Small Business | Male | 3 | 5.0 | Super Deluxe | 3.0 | Married | 5.0 | 0 | 3 | 0 | 1.0 | AVP | 32796.0 |
| 2648 | 1 | 39.0 | Self Enquiry | 1 | 36.0 | Small Business | Male | 4 | 4.0 | Deluxe | 5.0 | Divorced | 2.0 | 1 | 3 | 0 | 2.0 | Manager | 25351.0 |
# Check for type of Contact equal a Company Invited and then Duration Of Pitch extreme values
df[df.TypeofContact == "Company Invited"].sort_values(
by=["DurationOfPitch"], ascending=False
).head(5)
| ProdTaken | Age | TypeofContact | CityTier | DurationOfPitch | Occupation | Gender | NumberOfPersonVisiting | NumberOfFollowups | ProductPitched | PreferredPropertyStar | MaritalStatus | NumberOfTrips | Passport | PitchSatisfactionScore | OwnCar | NumberOfChildrenVisiting | Designation | MonthlyIncome | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 3878 | 0 | 53.0 | Company Invited | 3 | 127.0 | Salaried | Male | 3 | 4.0 | Basic | 3.0 | Married | 4.0 | 0 | 1 | 1 | 2.0 | Executive | 22160.0 |
| 1434 | 0 | NaN | Company Invited | 3 | 126.0 | Salaried | Male | 2 | 3.0 | Basic | 3.0 | Married | 3.0 | 0 | 1 | 1 | 1.0 | Executive | 18482.0 |
| 2505 | 0 | 39.0 | Company Invited | 1 | 36.0 | Salaried | Female | 3 | 4.0 | Deluxe | 3.0 | Single | 3.0 | 0 | 3 | 1 | 1.0 | Manager | 21084.0 |
| 2853 | 0 | 43.0 | Company Invited | 1 | 36.0 | Salaried | Female | 4 | 4.0 | Deluxe | 3.0 | Married | 4.0 | 0 | 3 | 1 | 3.0 | Manager | 23234.0 |
| 3200 | 0 | 33.0 | Company Invited | 1 | 36.0 | Small Business | Female | 4 | 4.0 | Basic | 3.0 | Unmarried | 2.0 | 0 | 3 | 1 | 1.0 | Executive | 22703.0 |
We gonna remove the number 1 and keep 27 and 26 respectivily.
# Replacing #DurationOfPitch in row index 3878 form 127 to 27
df.loc[3878, "DurationOfPitch"] = 27
# Replacing #DurationOfPitch in row index 1434 from 126 to 26
df.loc[1434, "DurationOfPitch"] = 26
# Checking data after replacement
df.sort_values(by=["DurationOfPitch"], ascending=False).head(3)
| ProdTaken | Age | TypeofContact | CityTier | DurationOfPitch | Occupation | Gender | NumberOfPersonVisiting | NumberOfFollowups | ProductPitched | PreferredPropertyStar | MaritalStatus | NumberOfTrips | Passport | PitchSatisfactionScore | OwnCar | NumberOfChildrenVisiting | Designation | MonthlyIncome | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 4662 | 1 | 27.0 | Company Invited | 3 | 36.0 | Small Business | Male | 4 | 6.0 | Deluxe | 5.0 | Unmarried | 2.0 | 0 | 3 | 1 | 2.0 | Manager | 23647.0 |
| 3200 | 0 | 33.0 | Company Invited | 1 | 36.0 | Small Business | Female | 4 | 4.0 | Basic | 3.0 | Unmarried | 2.0 | 0 | 3 | 1 | 1.0 | Executive | 22703.0 |
| 2869 | 0 | 51.0 | Self Enquiry | 1 | 36.0 | Salaried | Male | 3 | 5.0 | Super Deluxe | 3.0 | Divorced | NaN | 0 | 5 | 1 | 2.0 | AVP | 35724.0 |
df.sort_values(by=["NumberOfTrips"], ascending=False).head(5)
| ProdTaken | Age | TypeofContact | CityTier | DurationOfPitch | Occupation | Gender | NumberOfPersonVisiting | NumberOfFollowups | ProductPitched | PreferredPropertyStar | MaritalStatus | NumberOfTrips | Passport | PitchSatisfactionScore | OwnCar | NumberOfChildrenVisiting | Designation | MonthlyIncome | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 3260 | 0 | 40.0 | Company Invited | 1 | 16.0 | Salaried | Male | 4 | 4.0 | Deluxe | 4.0 | Unmarried | 22.0 | 0 | 2 | 1 | 1.0 | Manager | 25460.0 |
| 816 | 0 | 39.0 | Company Invited | 1 | 15.0 | Salaried | Male | 3 | 3.0 | Deluxe | 4.0 | Unmarried | 21.0 | 0 | 2 | 1 | 0.0 | Manager | 21782.0 |
| 2829 | 1 | 31.0 | Company Invited | 1 | 11.0 | Large Business | Male | 3 | 4.0 | Basic | 3.0 | Single | 20.0 | 1 | 4 | 1 | 2.0 | Executive | 20963.0 |
| 385 | 1 | 30.0 | Company Invited | 1 | 10.0 | Large Business | Male | 2 | 3.0 | Basic | 3.0 | Single | 19.0 | 1 | 4 | 1 | 1.0 | Executive | 17285.0 |
| 3074 | 0 | 23.0 | Self Enquiry | 1 | 7.0 | Salaried | Male | 3 | 5.0 | Deluxe | 3.0 | Divorced | 8.0 | 0 | 2 | 1 | 1.0 | Manager | 23453.0 |
We not gonna treat this informations as outliers. .
df.groupby(["Occupation"])["NumberOfTrips"].mean()
Occupation Free Lancer 7.500000 Large Business 3.456019 Salaried 3.221062 Small Business 3.202877 Name: NumberOfTrips, dtype: float64
df.groupby(["Occupation"])["NumberOfTrips"].max()
Occupation Free Lancer 8.0 Large Business 20.0 Salaried 22.0 Small Business 8.0 Name: NumberOfTrips, dtype: float64
df.sort_values(by=["MonthlyIncome"]).head(5)
| ProdTaken | Age | TypeofContact | CityTier | DurationOfPitch | Occupation | Gender | NumberOfPersonVisiting | NumberOfFollowups | ProductPitched | PreferredPropertyStar | MaritalStatus | NumberOfTrips | Passport | PitchSatisfactionScore | OwnCar | NumberOfChildrenVisiting | Designation | MonthlyIncome | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 142 | 0 | 38.0 | Self Enquiry | 1 | 9.0 | Large Business | Female | 2 | 3.0 | Deluxe | 3.0 | Single | 4.0 | 1 | 5 | 0 | 0.0 | Manager | 1000.0 |
| 2586 | 0 | 39.0 | Self Enquiry | 1 | 10.0 | Large Business | Female | 3 | 4.0 | Deluxe | 3.0 | Single | 5.0 | 1 | 5 | 0 | 1.0 | Manager | 4678.0 |
| 513 | 1 | 20.0 | Self Enquiry | 1 | 16.0 | Small Business | Male | 2 | 3.0 | Basic | 3.0 | Single | 2.0 | 1 | 5 | 0 | 0.0 | Executive | 16009.0 |
| 1983 | 1 | 20.0 | Self Enquiry | 1 | 16.0 | Small Business | Male | 2 | 3.0 | Basic | 3.0 | Single | 2.0 | 1 | 5 | 1 | 1.0 | Executive | 16009.0 |
| 2197 | 0 | 18.0 | Company Invited | 1 | 11.0 | Salaried | Male | 3 | 3.0 | Basic | 3.0 | Single | 2.0 | 0 | 1 | 0 | 1.0 | Executive | 16051.0 |
Large Business, both single and travels 4 and 5 times in a year. So this income is an outlier.# Checking the min Monthly Income for each Occupation
df.groupby(["Occupation"])["MonthlyIncome"].min()
Occupation Free Lancer 17090.0 Large Business 1000.0 Salaried 16051.0 Small Business 16009.0 Name: MonthlyIncome, dtype: float64
# Checking the min Monthly Income for Large Business
df[df.Occupation == "Large Business"].sort_values(by="MonthlyIncome").head(5)
| ProdTaken | Age | TypeofContact | CityTier | DurationOfPitch | Occupation | Gender | NumberOfPersonVisiting | NumberOfFollowups | ProductPitched | PreferredPropertyStar | MaritalStatus | NumberOfTrips | Passport | PitchSatisfactionScore | OwnCar | NumberOfChildrenVisiting | Designation | MonthlyIncome | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 142 | 0 | 38.0 | Self Enquiry | 1 | 9.0 | Large Business | Female | 2 | 3.0 | Deluxe | 3.0 | Single | 4.0 | 1 | 5 | 0 | 0.0 | Manager | 1000.0 |
| 2586 | 0 | 39.0 | Self Enquiry | 1 | 10.0 | Large Business | Female | 3 | 4.0 | Deluxe | 3.0 | Single | 5.0 | 1 | 5 | 0 | 1.0 | Manager | 4678.0 |
| 1365 | 1 | 29.0 | Company Invited | 3 | 30.0 | Large Business | Male | 2 | 1.0 | Basic | 5.0 | Single | 2.0 | 0 | 3 | 1 | 1.0 | Executive | 16091.0 |
| 1052 | 0 | 30.0 | Company Invited | 1 | 13.0 | Large Business | Male | 3 | 3.0 | Basic | 3.0 | Married | 2.0 | 0 | 5 | 0 | 0.0 | Executive | 16274.0 |
| 977 | 0 | 34.0 | Company Invited | 1 | 32.0 | Large Business | Female | 3 | 4.0 | Basic | 3.0 | Married | 2.0 | 0 | 1 | 1 | 0.0 | Executive | 17029.0 |
Considering that the min for Large Business is 16091.0, we gonna replace the other 2 values with the min for this category.
df.MonthlyIncome = df.MonthlyIncome.apply(lambda x: 16091.0 if x < 5000 else x)
# Checking the min Monthly Income for each Occupation after replaicing it
df.groupby(["Occupation"])["MonthlyIncome"].min()
Occupation Free Lancer 17090.0 Large Business 16091.0 Salaried 16051.0 Small Business 16009.0 Name: MonthlyIncome, dtype: float64
df.sort_values(by="MonthlyIncome", ascending=False).head(5)
| ProdTaken | Age | TypeofContact | CityTier | DurationOfPitch | Occupation | Gender | NumberOfPersonVisiting | NumberOfFollowups | ProductPitched | PreferredPropertyStar | MaritalStatus | NumberOfTrips | Passport | PitchSatisfactionScore | OwnCar | NumberOfChildrenVisiting | Designation | MonthlyIncome | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2482 | 0 | 37.0 | Self Enquiry | 1 | 12.0 | Salaried | Female | 3 | 5.0 | Basic | 5.0 | Divorced | 2.0 | 1 | 2 | 1 | 1.0 | Executive | 98678.0 |
| 38 | 0 | 36.0 | Self Enquiry | 1 | 11.0 | Salaried | Female | 2 | 4.0 | Basic | NaN | Divorced | 1.0 | 1 | 2 | 1 | 0.0 | Executive | 95000.0 |
| 4104 | 0 | 53.0 | Self Enquiry | 1 | 7.0 | Salaried | Male | 4 | 5.0 | King | NaN | Married | 2.0 | 0 | 1 | 1 | 3.0 | VP | 38677.0 |
| 2634 | 0 | 53.0 | Self Enquiry | 1 | 7.0 | Salaried | Male | 4 | 5.0 | King | NaN | Divorced | 2.0 | 0 | 2 | 1 | 2.0 | VP | 38677.0 |
| 4660 | 0 | 42.0 | Company Invited | 1 | 14.0 | Salaried | Female | 3 | 6.0 | King | NaN | Married | 3.0 | 0 | 4 | 1 | 2.0 | VP | 38651.0 |
df.groupby(["Designation", "Occupation"])["MonthlyIncome"].min()
Designation Occupation
AVP Free Lancer NaN
Large Business 28120.0
Salaried 21151.0
Small Business 17705.0
Executive Free Lancer 17090.0
Large Business 16091.0
Salaried 16051.0
Small Business 16009.0
Manager Free Lancer NaN
Large Business 16091.0
Salaried 17272.0
Small Business 17042.0
Senior Manager Free Lancer NaN
Large Business 17372.0
Salaried 17372.0
Small Business 17875.0
VP Free Lancer NaN
Large Business 34232.0
Salaried 33041.0
Small Business 17517.0
Name: MonthlyIncome, dtype: float64
df.groupby(["Designation", "Gender"])["MonthlyIncome"].mean()
Designation Gender
AVP Female 32211.587500
Male 32266.945055
Executive Female 20163.681115
Male 19809.581605
Manager Female 22527.410606
Male 22754.277538
Senior Manager Female 26851.558282
Male 26470.197115
VP Female 35572.951220
Male 36048.486486
Name: MonthlyIncome, dtype: float64
# Replacing #High Income in row index 3878 form 127 to 27
df.loc[2482, "MonthlyIncome"] = 20000
df.loc[38, "MonthlyIncome"] = 20000
df.sort_values(by="MonthlyIncome", ascending=False).head()
| ProdTaken | Age | TypeofContact | CityTier | DurationOfPitch | Occupation | Gender | NumberOfPersonVisiting | NumberOfFollowups | ProductPitched | PreferredPropertyStar | MaritalStatus | NumberOfTrips | Passport | PitchSatisfactionScore | OwnCar | NumberOfChildrenVisiting | Designation | MonthlyIncome | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2634 | 0 | 53.0 | Self Enquiry | 1 | 7.0 | Salaried | Male | 4 | 5.0 | King | NaN | Divorced | 2.0 | 0 | 2 | 1 | 2.0 | VP | 38677.0 |
| 4104 | 0 | 53.0 | Self Enquiry | 1 | 7.0 | Salaried | Male | 4 | 5.0 | King | NaN | Married | 2.0 | 0 | 1 | 1 | 3.0 | VP | 38677.0 |
| 3190 | 0 | 42.0 | Company Invited | 1 | 14.0 | Salaried | Female | 3 | 6.0 | King | NaN | Married | 3.0 | 0 | 4 | 1 | 1.0 | VP | 38651.0 |
| 4660 | 0 | 42.0 | Company Invited | 1 | 14.0 | Salaried | Female | 3 | 6.0 | King | NaN | Married | 3.0 | 0 | 4 | 1 | 2.0 | VP | 38651.0 |
| 3295 | 0 | 57.0 | Self Enquiry | 1 | 11.0 | Large Business | Female | 4 | 4.0 | King | NaN | Married | 6.0 | 0 | 4 | 0 | 3.0 | VP | 38621.0 |
# A function to create boxplot and histogram for any input numerical
def hist_boxplot(dataframe, figsize=(15, 8), bins=None):
"""
This function takes the numerical column as the input and returns the boxplots and histograms for the numerical variable.
dataframe: 1-d feature array
figsize: size of fig (default (13,8))
bins: number of bins (default None / auto)
"""
# Figure aesthetics
sns.set_style("white")
# Creating the 2 subplots
fig, (ax_box, ax_hist) = plt.subplots(nrows=2, sharex=True, figsize=figsize)
# Boxplot will be created and a red square will indicate the mean value of the column
sns.boxplot(
dataframe,
ax=ax_box,
showmeans=True,
meanprops={"marker": "s", "markerfacecolor": "red"},
color="xkcd:eggshell",
)
# For histogram
sns.distplot(
dataframe, kde=F, ax=ax_hist, color="lightblue", bins=bins
) if bins else sns.distplot(dataframe, kde=False, ax=ax_hist, color="lightblue")
# Add mean to the histogram
ax_hist.axvline(np.mean(dataframe), color="r", linestyle="dotted")
# Add median to the histogram
ax_hist.axvline(np.median(dataframe), color="gray", linestyle="solid")
# Function to create barplots that indicate percentage for each category.
def perc_on_bar(dataframe):
"""
plot
feature: categorical feature
the function won't work if a column is passed in hue parameter
"""
total = len(dataframe) # length of the column
plt.figure(figsize=(15, 5))
ax = sns.countplot(dataframe, palette="Paired")
for p in ax.patches:
percentage = "{:.1f}%".format(
100 * p.get_height() / total
) # percentage of each class of the category
x = p.get_x() + p.get_width() / 2 - 0.05 # width of the plot
y = p.get_y() + p.get_height() # hieght of the plot
ax.annotate(percentage, (x, y), size=12) # annotate the percantage
plt.show() # show the plot
hist_boxplot(df.Age)
hist_boxplot(df.DurationOfPitch)
hist_boxplot(df.NumberOfPersonVisiting)
hist_boxplot(df.NumberOfFollowups)
hist_boxplot(df.MonthlyIncome)
perc_on_bar(df.ProdTaken)
perc_on_bar(df.CityTier)
perc_on_bar(df.PreferredPropertyStar)
perc_on_bar(df.NumberOfTrips)
perc_on_bar(df.Passport)
perc_on_bar(df.PitchSatisfactionScore)
perc_on_bar(df.OwnCar)
perc_on_bar(df.NumberOfChildrenVisiting)
perc_on_bar(df.TypeofContact)
perc_on_bar(df.Occupation)
perc_on_bar(df.Gender)
perc_on_bar(df.ProductPitched)
perc_on_bar(df.MaritalStatus)
perc_on_bar(df.Designation)
df.corr()["ProdTaken"].sort_values(ascending=False)
ProdTaken 1.000000 Passport 0.260844 NumberOfFollowups 0.112171 PreferredPropertyStar 0.099577 CityTier 0.086852 DurationOfPitch 0.083796 PitchSatisfactionScore 0.051394 NumberOfTrips 0.018898 NumberOfPersonVisiting 0.009627 NumberOfChildrenVisiting 0.007421 OwnCar -0.011508 MonthlyIncome -0.133944 Age -0.147254 Name: ProdTaken, dtype: float64
plt.figure(figsize=(12, 7))
sns.heatmap(df.corr(), annot=True, vmin=-1, vmax=1, fmt=".2f", cmap="GnBu")
plt.show
<function matplotlib.pyplot.show(close=None, block=None)>
OwnCar, PitchSatisfactionScore, PreferredPropertyStar, DurationOfPitch,CityTier, does not show correlation with any of the others variables.ProdTaken has a 0.26 correlation with Passaport, meaning that customer that have passaport are more likehood to buy a travel package.ProdTaken has negative correlation with MonthlyIncome -0. 13 and -0.15 with Age, meaning that young customers, that usually has less income, are more likehood to buy a travel package.Age and MonthlyIncome has 0.49 correlation# Creating a list with numerical variables
num_col = []
for col in df.columns:
if (
df[col].nunique() > 10
): # filter only numerical variables with more than 10 unique values
num_col.append(col)
# Boxplot for numerical variables
plt.figure(figsize=(12, 6))
for i, variable in enumerate(num_col):
plt.subplot(2, 2, i + 1)
sns.boxplot(df["ProdTaken"], df[variable])
plt.tight_layout()
plt.title(variable)
plt.show()
# Creating list for diferent kind of categorical variables wich we gonna treat as `Label Incode`
cat2_col = [] # list of ordinal variables
catBi_col = [] # list of binary variables
for col in df.columns:
not_in_list = col not in cat_col
if not_in_list == True:
if df[col].nunique() < 10 and df[col].nunique() > 2:
cat2_col.append(col)
elif df[col].nunique() == 2 and col != "ProdTaken":
catBi_col.append(col)
# Boxplot for categotical variables
def cat_plot(x):
"""
plot
feature: categorical feature
plot subplot for variables on list x
"""
y = 2
plt.figure(figsize=(12, (len(x) / y) * 3))
for i, variable in enumerate(x):
plt.subplot(len(x) / y, y, i + 1)
sns.countplot(x=df[variable], hue=df["ProdTaken"], palette="Pastel2")
plt.tight_layout()
plt.title(variable)
plt.show()
Dummy variables
cat_plot(cat_col)
Label Encoding
cat_plot(cat2_col)
cat_plot(catBi_col)
def stacked_plot(x):
sns.set(palette="nipy_spectral")
tab1 = pd.crosstab(x, df["ProdTaken"], margins=True)
print(tab1)
print("-" * 120)
tab = pd.crosstab(x, df["ProdTaken"], normalize="index")
tab.plot(kind="bar", stacked=True, figsize=(10, 5))
plt.legend(loc="lower left", frameon=False)
plt.legend(loc="upper left", bbox_to_anchor=(1, 1))
plt.show()
stacked_plot(df["Passport"])
ProdTaken 0 1 All Passport 0 3040 426 3466 1 928 494 1422 All 3968 920 4888 ------------------------------------------------------------------------------------------------------------------------
stacked_plot(df["MaritalStatus"])
ProdTaken 0 1 All MaritalStatus Divorced 826 124 950 Married 2014 326 2340 Single 612 304 916 Unmarried 516 166 682 All 3968 920 4888 ------------------------------------------------------------------------------------------------------------------------
# analyzing DurationOfPitch distribution
plt.figure(figsize=(15, 6))
# Plot informations
colors = ["red", "greenyellow"]
bins = np.linspace(0, 40, 9) # creating bins on evenly spaced number on range 0 to 40
# Plotting stacked histogram
plot_t = sns.histplot(
data=df,
x="DurationOfPitch",
bins=bins,
hue="ProdTaken",
multiple="stack",
palette=colors,
)
plt.xlabel("Duration Of Pitch")
plt.ylabel("Count")
plt.xticks(
[0, 5, 10, 15, 20, 25, 30, 35, 40]
) # specifying ticks via the xticks function.
plt.title("Duration Of Pitch distribution")
# Calculating the length of the column
total = len(df["DurationOfPitch"])
# Looping to calculate percentage of bins and annotate the percentage
for t in plot_t.patches:
percentage = "{:.1f}%".format(100 * t.get_height() / total)
x = t.get_x() + t.get_width() / 2 - 0.75
y = t.get_y() + t.get_height() / 2
plot_t.annotate(percentage, (x, y), size=14)
30% of customer has a duration of pitch between 5 and 10, this is the time our team has to convince the customer about buying our package or make them listening us for more time.
# List of colors to use for the different ProdTaken
colors = ["red", "greenyellow"]
# Ploting scatterplot for analysis about correlation
plt.figure(figsize=(15, 8))
sns.scatterplot(
df["Age"],
df["MonthlyIncome"],
hue=df["ProdTaken"],
palette=colors,
)
plt.title("ProdTaken by Income, and Age")
Text(0.5, 1.0, 'ProdTaken by Income, and Age')
sns.catplot(x="Designation", y="MonthlyIncome", hue="Gender", kind="swarm", data=df)
<seaborn.axisgrid.FacetGrid at 0x1508c5fd370>
# Ploting scatterplot for analysis about correlation
plt.figure(figsize=(15, 8))
sns.scatterplot(df["Age"], df["MonthlyIncome"], hue=df["Designation"])
plt.title("Age by Income, and Designation")
Text(0.5, 1.0, 'Age by Income, and Designation')
We can see a pattern here.
# Ploting scatterplot for analysis about correlation
plt.figure(figsize=(15, 8))
sns.scatterplot(df["Age"], df["MonthlyIncome"], hue=df["Occupation"])
plt.title("Age by Income, and Occupation")
Text(0.5, 1.0, 'Age by Income, and Occupation')
cols = df[
[
"MonthlyIncome",
"Age",
"NumberOfFollowups",
"NumberOfPersonVisiting",
"NumberOfTrips",
]
].columns.tolist()
plt.figure(figsize=(15, 6))
for i, variable in enumerate(cols):
plt.subplot(3, 2, i + 1)
sns.boxplot(df["Passport"], df[variable], hue=df["ProdTaken"])
plt.tight_layout()
plt.title(variable)
plt.legend(bbox_to_anchor=(1, 1))
plt.show()
Customer with or without passaport seems to have almost the same behavier considering Monthly Income, Age, Number of Folowwups, Number Of Person Visiting and Number of Trips. Further we'll check the behavier only for customer who bought the travel package.
cols = df[
["MonthlyIncome", "Age", "NumberOfFollowups", "NumberOfPersonVisiting"]
].columns.tolist()
plt.figure(figsize=(15, 6))
for i, variable in enumerate(cols):
plt.subplot(2, 2, i + 1)
sns.lineplot(df["NumberOfTrips"], df[variable], hue=df["ProdTaken"], ci=0)
plt.tight_layout()
plt.title(variable)
plt.legend(bbox_to_anchor=(1, 1))
plt.show()
# Ploting countplot for analysis between Monthly Income, Product Pitched and ProdTaken
plt.figure(figsize=(15, 8))
sns.barplot(
x=df["MonthlyIncome"], y=df["ProductPitched"], hue=df["ProdTaken"], palette="Set3"
)
plt.title("Monthly Income vs Product Pitched")
Text(0.5, 1.0, 'Monthly Income vs Product Pitched')
We can see a patter here where customers with:
# Ploting countplot for analysis between Monthly Income, Product Pitched and ProdTaken
plt.figure(figsize=(15, 8))
sns.lineplot(x=df["Age"], y=df["ProductPitched"], hue=df["ProdTaken"], palette="Set3")
plt.title("Age vs Product Pitched")
Text(0.5, 1.0, 'Age vs Product Pitched')
Here we can see clearly the correlation between Age and Product Pitched. Younger the customer basic is the package.
# Ploting countplot for analysis between Monthly Income, Product Pitched and ProdTaken
plt.figure(figsize=(15, 8))
sns.lineplot(x=df["Age"], y=df["MonthlyIncome"], hue=df["ProdTaken"], palette="Set3")
plt.title("Age vs MonthlyIncome")
Text(0.5, 1.0, 'Age vs MonthlyIncome')
Here we can see clearly the correlation between Age and Monthly Income. How it was expected, older the customer, higher usually is the Income.
# Ploting scatterplot for analysis about correlation
plt.figure(figsize=(15, 8))
sns.scatterplot(df["Age"], df["MonthlyIncome"], hue=df["ProductPitched"])
plt.title("Age by Income, and ProdPitched")
Text(0.5, 1.0, 'Age by Income, and ProdPitched')
Putting all togheter, We can see kind of a pattern between Age, Income and ProdPitched.
plt.figure(figsize=(15, 7))
sns.boxplot(x="Designation", y="MonthlyIncome", hue="ProdTaken", data=df)
<AxesSubplot:xlabel='Designation', ylabel='MonthlyIncome'>
plt.figure(figsize=(15, 7))
sns.boxplot(
x="MaritalStatus", y="MonthlyIncome", hue="NumberOfChildrenVisiting", data=df
)
<AxesSubplot:xlabel='MaritalStatus', ylabel='MonthlyIncome'>
cols = df[
[
"NumberOfFollowups",
"Age",
"NumberOfTrips",
"NumberOfPersonVisiting",
"DurationOfPitch",
"MonthlyIncome",
]
].columns.tolist()
plt.figure(figsize=(15, 9))
for i, variable in enumerate(cols):
plt.subplot(3, 2, i + 1)
sns.boxplot(x="TypeofContact", y=variable, hue="ProdTaken", data=df)
plt.tight_layout()
plt.title(variable)
plt.show()
# Lets check only customer who bought the travel package
df_temp = df[df["ProdTaken"] == 1]
# Lets check only customer who DID NOT bought the travel package
df_tempNO = df[df["ProdTaken"] == 0]
perc_on_bar(df_temp.ProductPitched)
perc_on_bar(df_temp.Designation)
perc_on_bar(df_temp.Passport)
perc_on_bar(df_temp.NumberOfFollowups)
perc_on_bar(df_tempNO.NumberOfFollowups)
df_temp.groupby(["Designation"])["MonthlyIncome"].mean()
Designation AVP 29823.800000 Executive 20161.529301 Manager 23106.215385 Senior Manager 26035.419355 VP 34672.100000 Name: MonthlyIncome, dtype: float64
df_temp.groupby(["Designation"])["Age"].mean()
Designation AVP 43.500000 Executive 31.289320 Manager 37.641414 Senior Manager 41.008130 VP 48.900000 Name: Age, dtype: float64
plt.figure(figsize=(15, 7))
sns.boxplot(x="Designation", y="MonthlyIncome", hue="ProductPitched", data=df_temp)
<AxesSubplot:xlabel='Designation', ylabel='MonthlyIncome'>
plt.figure(figsize=(15, 7))
sns.countplot(x="MaritalStatus", data=df_temp)
<AxesSubplot:xlabel='MaritalStatus', ylabel='count'>
plt.figure(figsize=(15, 7))
sns.countplot(x="Designation", hue="MaritalStatus", data=df_temp)
<AxesSubplot:xlabel='Designation', ylabel='count'>
df_temp.groupby(["Designation"])["NumberOfChildrenVisiting"].value_counts()
Designation NumberOfChildrenVisiting
AVP 1.0 9
2.0 6
0.0 4
3.0 1
Executive 1.0 234
2.0 158
0.0 118
3.0 41
Manager 1.0 90
2.0 56
0.0 45
3.0 12
Senior Manager 1.0 52
0.0 33
2.0 28
3.0 10
VP 1.0 7
2.0 5
0.0 2
3.0 2
Name: NumberOfChildrenVisiting, dtype: int64
# Ploting countplot for analysis between Monthly Income, Product Pitched and ProdTaken
plt.figure(figsize=(15, 8))
sns.countplot(
x=df_temp["DurationOfPitch"],
)
<AxesSubplot:xlabel='DurationOfPitch', ylabel='count'>
# Ploting countplot for analysis between Monthly Income, Product Pitched and ProdTaken
plt.figure(figsize=(15, 8))
sns.countplot(
x=df_tempNO["DurationOfPitch"],
)
<AxesSubplot:xlabel='DurationOfPitch', ylabel='count'>
# Ploting countplot for analysis between Monthly Income, Product Pitched and ProdTaken
plt.figure(figsize=(15, 8))
sns.barplot(
x=df_temp["DurationOfPitch"],
y=df_temp["NumberOfFollowups"],
)
<AxesSubplot:xlabel='DurationOfPitch', ylabel='NumberOfFollowups'>
# Ploting scatterplot for analysis about correlation
plt.figure(figsize=(15, 8))
sns.scatterplot(
df_temp["Age"],
df_temp["MonthlyIncome"],
hue=df_temp["ProductPitched"],
size=df_temp["Passport"],
)
plt.title("Age by Income, and ProdPitched")
Text(0.5, 1.0, 'Age by Income, and ProdPitched')
plt.figure(figsize=(15, 7))
sns.countplot(x="ProductPitched", hue="Passport", data=df_temp)
<AxesSubplot:xlabel='ProductPitched', ylabel='count'>
# Before we start looking at the individual distributions and interactions, let's quickly check the missingness in the data.
df.isnull().sum().sort_values(ascending=False)
DurationOfPitch 251 MonthlyIncome 233 Age 226 NumberOfTrips 140 NumberOfChildrenVisiting 66 NumberOfFollowups 45 PreferredPropertyStar 26 TypeofContact 25 Gender 0 CityTier 0 Occupation 0 ProductPitched 0 NumberOfPersonVisiting 0 Designation 0 MaritalStatus 0 Passport 0 PitchSatisfactionScore 0 OwnCar 0 ProdTaken 0 dtype: int64
df_XG = df.copy()
# Looking at a few rows where # DurationOfPitch is missing
df[df["DurationOfPitch"].isnull()]
| ProdTaken | Age | TypeofContact | CityTier | DurationOfPitch | Occupation | Gender | NumberOfPersonVisiting | NumberOfFollowups | ProductPitched | PreferredPropertyStar | MaritalStatus | NumberOfTrips | Passport | PitchSatisfactionScore | OwnCar | NumberOfChildrenVisiting | Designation | MonthlyIncome | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 46 | 0 | 34.0 | Company Invited | 3 | NaN | Small Business | Male | 3 | 3.0 | Deluxe | 3.0 | Single | 1.0 | 0 | 5 | 1 | 1.0 | Manager | 19568.0 |
| 75 | 0 | 31.0 | Self Enquiry | 1 | NaN | Salaried | Female | 3 | 3.0 | Deluxe | 3.0 | Single | 2.0 | 0 | 5 | 0 | 1.0 | Manager | NaN |
| 76 | 0 | 35.0 | Self Enquiry | 3 | NaN | Small Business | Male | 2 | 4.0 | Deluxe | 5.0 | Single | 1.0 | 0 | 2 | 0 | 1.0 | Manager | NaN |
| 84 | 0 | 34.0 | Self Enquiry | 1 | NaN | Small Business | Male | 3 | 3.0 | Deluxe | 4.0 | Divorced | 2.0 | 0 | 5 | 0 | 0.0 | Manager | NaN |
| 103 | 0 | 34.0 | Self Enquiry | 1 | NaN | Salaried | Female | 2 | 3.0 | Deluxe | 3.0 | Single | 2.0 | 0 | 3 | 1 | 0.0 | Manager | 19809.0 |
| 114 | 0 | 34.0 | Self Enquiry | 1 | NaN | Salaried | Female | 2 | 4.0 | Deluxe | 4.0 | Married | 7.0 | 0 | 3 | 0 | 1.0 | Manager | 19505.0 |
| 130 | 0 | 43.0 | Company Invited | 1 | NaN | Small Business | Female | 3 | 2.0 | Basic | 3.0 | Single | 5.0 | 0 | 3 | 0 | 2.0 | Executive | 19739.0 |
| 132 | 1 | 31.0 | Self Enquiry | 3 | NaN | Salaried | Female | 3 | 5.0 | Deluxe | 3.0 | Divorced | 4.0 | 1 | 3 | 1 | 0.0 | Manager | 19559.0 |
| 144 | 0 | 32.0 | Company Invited | 3 | NaN | Small Business | Male | 2 | 5.0 | Deluxe | 3.0 | Married | 1.0 | 0 | 2 | 0 | 1.0 | Manager | 19668.0 |
| 155 | 0 | 29.0 | Company Invited | 1 | NaN | Large Business | Male | 2 | 3.0 | Deluxe | 3.0 | Divorced | 2.0 | 0 | 3 | 0 | 1.0 | Manager | NaN |
| 175 | 0 | 56.0 | Self Enquiry | 1 | NaN | Salaried | Female | 3 | 3.0 | Basic | 5.0 | Married | 5.0 | 1 | 4 | 1 | 1.0 | Executive | NaN |
| 184 | 0 | 53.0 | Self Enquiry | 1 | NaN | Small Business | Female | 2 | 2.0 | Deluxe | 5.0 | Married | 2.0 | 0 | 2 | 1 | 1.0 | Manager | NaN |
| 196 | 0 | 35.0 | Company Invited | 1 | NaN | Small Business | Female | 2 | 3.0 | Deluxe | 4.0 | Single | 6.0 | 0 | 2 | 1 | 0.0 | Manager | NaN |
| 200 | 0 | 27.0 | Company Invited | 1 | NaN | Large Business | Male | 2 | 4.0 | Deluxe | 5.0 | Divorced | 6.0 | 0 | 3 | 0 | 1.0 | Manager | NaN |
| 208 | 0 | 40.0 | Company Invited | 1 | NaN | Salaried | Male | 3 | 4.0 | Deluxe | 3.0 | Divorced | 1.0 | 0 | 5 | 1 | 0.0 | Manager | 19876.0 |
| 224 | 0 | 31.0 | NaN | 1 | NaN | Small Business | Male | 2 | 5.0 | Deluxe | 3.0 | Divorced | 1.0 | 0 | 3 | 1 | 0.0 | Manager | NaN |
| 241 | 0 | 32.0 | Company Invited | 3 | NaN | Small Business | Male | 3 | 3.0 | Deluxe | 3.0 | Divorced | 1.0 | 0 | 4 | 1 | 2.0 | Manager | NaN |
| 255 | 0 | 25.0 | Self Enquiry | 1 | NaN | Salaried | Female | 3 | 3.0 | Deluxe | 3.0 | Divorced | 1.0 | 0 | 3 | 1 | 1.0 | Manager | 19898.0 |
| 282 | 0 | 29.0 | Company Invited | 3 | NaN | Salaried | Male | 2 | 3.0 | Deluxe | 3.0 | Single | 2.0 | 0 | 2 | 0 | 0.0 | Manager | 19554.0 |
| 284 | 0 | 26.0 | Company Invited | 1 | NaN | Small Business | Male | 2 | 3.0 | Deluxe | 5.0 | Divorced | 2.0 | 1 | 3 | 1 | 1.0 | Manager | 19741.0 |
| 291 | 0 | 36.0 | Self Enquiry | 1 | NaN | Large Business | Male | 1 | 3.0 | Deluxe | 4.0 | Single | 5.0 | 0 | 2 | 1 | 0.0 | Manager | 19485.0 |
| 309 | 0 | 31.0 | Self Enquiry | 1 | NaN | Large Business | Male | 2 | 3.0 | Basic | 3.0 | Divorced | 1.0 | 1 | 4 | 0 | 1.0 | Executive | 19821.0 |
| 320 | 0 | 27.0 | Self Enquiry | 3 | NaN | Salaried | Male | 3 | 3.0 | Deluxe | 3.0 | Single | 2.0 | 1 | 4 | 1 | 2.0 | Manager | NaN |
| 328 | 0 | 33.0 | Company Invited | 3 | NaN | Small Business | Male | 2 | 4.0 | Deluxe | 3.0 | Single | 4.0 | 0 | 3 | 1 | 1.0 | Manager | 19682.0 |
| 332 | 0 | 54.0 | Company Invited | 1 | NaN | Salaried | Female | 2 | 3.0 | Deluxe | 3.0 | Single | 4.0 | 0 | 4 | 1 | 0.0 | Manager | 19869.0 |
| 349 | 0 | 29.0 | Company Invited | 3 | NaN | Salaried | Male | 3 | 3.0 | Deluxe | 3.0 | Divorced | 2.0 | 0 | 2 | 1 | 1.0 | Manager | 19649.0 |
| 354 | 0 | 30.0 | Company Invited | 3 | NaN | Large Business | Female | 2 | 3.0 | Deluxe | 3.0 | Divorced | 1.0 | 1 | 3 | 1 | 0.0 | Manager | 19736.0 |
| 380 | 0 | 24.0 | Self Enquiry | 3 | NaN | Small Business | Female | 3 | 3.0 | Deluxe | 3.0 | Married | 2.0 | 1 | 2 | 0 | 2.0 | Manager | NaN |
| 394 | 0 | 31.0 | Self Enquiry | 1 | NaN | Small Business | Female | 2 | 3.0 | Deluxe | 5.0 | Divorced | 2.0 | 0 | 2 | 1 | 0.0 | Manager | NaN |
| 396 | 0 | 43.0 | Self Enquiry | 1 | NaN | Salaried | Female | 3 | 3.0 | Deluxe | 3.0 | Married | 5.0 | 1 | 2 | 0 | 0.0 | Manager | 19522.0 |
| 397 | 0 | 25.0 | Self Enquiry | 3 | NaN | Salaried | Female | 2 | 4.0 | Deluxe | 3.0 | Single | 2.0 | 1 | 2 | 0 | 1.0 | Manager | 19487.0 |
| 398 | 0 | 37.0 | Company Invited | 1 | NaN | Small Business | Female | 3 | 3.0 | Deluxe | 3.0 | Divorced | 4.0 | 1 | 3 | 1 | 1.0 | Manager | NaN |
| 404 | 0 | 28.0 | Self Enquiry | 1 | NaN | Small Business | Male | 3 | 3.0 | Deluxe | 5.0 | Married | 2.0 | 0 | 2 | 0 | 0.0 | Manager | 19558.0 |
| 409 | 0 | 42.0 | Company Invited | 1 | NaN | Salaried | Female | 3 | 3.0 | Deluxe | 3.0 | Married | 3.0 | 0 | 3 | 1 | 2.0 | Manager | 19556.0 |
| 412 | 0 | 46.0 | Self Enquiry | 1 | NaN | Small Business | Female | 2 | 3.0 | Deluxe | 3.0 | Married | 3.0 | 0 | 5 | 0 | 0.0 | Manager | 19810.0 |
| 413 | 0 | 42.0 | Company Invited | 1 | NaN | Large Business | Female | 2 | 4.0 | Deluxe | 3.0 | Divorced | 1.0 | 0 | 3 | 0 | 0.0 | Manager | 19523.0 |
| 447 | 0 | 35.0 | Self Enquiry | 3 | NaN | Small Business | Male | 2 | 3.0 | Deluxe | 3.0 | Divorced | 1.0 | 0 | 3 | 1 | 0.0 | Manager | 19717.0 |
| 452 | 0 | 45.0 | Self Enquiry | 3 | NaN | Salaried | Male | 3 | 3.0 | Deluxe | 4.0 | Divorced | 1.0 | 0 | 5 | 1 | 0.0 | Manager | 19805.0 |
| 454 | 0 | 29.0 | Self Enquiry | 1 | NaN | Large Business | Male | 2 | 3.0 | Deluxe | 3.0 | Married | 5.0 | 0 | 5 | 0 | 0.0 | Manager | NaN |
| 460 | 0 | 26.0 | Self Enquiry | 3 | NaN | Small Business | Male | 2 | 3.0 | Deluxe | 3.0 | Married | 6.0 | 0 | 5 | 1 | 0.0 | Manager | NaN |
| 461 | 0 | 35.0 | Self Enquiry | 3 | NaN | Small Business | Female | 3 | 3.0 | Deluxe | 3.0 | Single | 2.0 | 0 | 4 | 0 | 2.0 | Manager | 19859.0 |
| 469 | 1 | 32.0 | Company Invited | 3 | NaN | Salaried | Male | 2 | 2.0 | Deluxe | 3.0 | Divorced | 1.0 | 0 | 3 | 1 | 1.0 | Manager | 19707.0 |
| 504 | 1 | 45.0 | Company Invited | 3 | NaN | Salaried | Female | 3 | 3.0 | Deluxe | 5.0 | Divorced | 3.0 | 0 | 2 | 0 | 0.0 | Manager | NaN |
| 517 | 0 | 25.0 | Self Enquiry | 3 | NaN | Salaried | Male | 2 | 2.0 | Deluxe | 4.0 | Divorced | 1.0 | 0 | 4 | 0 | 0.0 | Manager | 19851.0 |
| 521 | 0 | 27.0 | Company Invited | 3 | NaN | Small Business | Female | 3 | 2.0 | Deluxe | 3.0 | Married | 2.0 | 1 | 2 | 1 | 2.0 | Manager | 19647.0 |
| 522 | 0 | 37.0 | Self Enquiry | 1 | NaN | Salaried | Male | 3 | 2.0 | Basic | 3.0 | Single | 4.0 | 0 | 4 | 1 | 1.0 | Executive | 19680.0 |
| 525 | 1 | 24.0 | Self Enquiry | 3 | NaN | Salaried | Female | 3 | 3.0 | Deluxe | 3.0 | Single | 1.0 | 0 | 2 | 1 | 0.0 | Manager | 19577.0 |
| 526 | 0 | 39.0 | Self Enquiry | 1 | NaN | Large Business | Female | 3 | 4.0 | Deluxe | 3.0 | Single | 2.0 | 0 | 2 | 1 | 0.0 | Manager | 19553.0 |
| 570 | 0 | 52.0 | Company Invited | 1 | NaN | Small Business | Male | 2 | 3.0 | Basic | 3.0 | Divorced | 1.0 | 0 | 3 | 1 | 0.0 | Executive | NaN |
| 571 | 0 | 26.0 | NaN | 1 | NaN | Salaried | Female | 3 | 5.0 | Basic | 3.0 | Married | 4.0 | 0 | 4 | 1 | 2.0 | Executive | NaN |
| 572 | 0 | 29.0 | NaN | 1 | NaN | Small Business | Female | 3 | 3.0 | Deluxe | 3.0 | Divorced | 5.0 | 0 | 2 | 1 | 0.0 | Manager | NaN |
| 576 | 0 | 27.0 | NaN | 3 | NaN | Small Business | Male | 2 | 3.0 | Deluxe | 3.0 | Divorced | 1.0 | 0 | 3 | 0 | 1.0 | Manager | NaN |
| 579 | 0 | 34.0 | NaN | 1 | NaN | Small Business | Female | 2 | 4.0 | Basic | 5.0 | Single | 2.0 | 0 | 2 | 1 | 1.0 | Executive | NaN |
| 582 | 0 | 40.0 | Company Invited | 1 | NaN | Small Business | Female | 2 | 1.0 | Deluxe | 4.0 | Divorced | 2.0 | 0 | 3 | 0 | 1.0 | Manager | NaN |
| 598 | 1 | 28.0 | NaN | 1 | NaN | Small Business | Male | 2 | 3.0 | Basic | 3.0 | Single | 7.0 | 0 | 3 | 0 | 0.0 | Executive | NaN |
| 604 | 0 | 42.0 | Self Enquiry | 1 | NaN | Salaried | Male | 3 | 3.0 | Deluxe | 4.0 | Divorced | 2.0 | 0 | 3 | 1 | 1.0 | Manager | NaN |
| 612 | 0 | 28.0 | Self Enquiry | 3 | NaN | Small Business | Female | 2 | 3.0 | Deluxe | 4.0 | Divorced | 2.0 | 1 | 3 | 1 | 0.0 | Manager | 19779.0 |
| 622 | 0 | 32.0 | NaN | 3 | NaN | Salaried | Male | 3 | 3.0 | Deluxe | 3.0 | Married | 3.0 | 0 | 2 | 0 | 0.0 | Manager | NaN |
| 630 | 0 | 22.0 | Self Enquiry | 1 | NaN | Salaried | Male | 2 | 4.0 | Deluxe | 3.0 | Divorced | 7.0 | 0 | 2 | 1 | 0.0 | Manager | 19775.0 |
| 638 | 0 | 25.0 | Self Enquiry | 3 | NaN | Small Business | Male | 2 | 4.0 | Deluxe | 5.0 | Divorced | 2.0 | 0 | 3 | 1 | 1.0 | Manager | NaN |
| 651 | 0 | 47.0 | Self Enquiry | 3 | NaN | Small Business | Female | 2 | 3.0 | Deluxe | 3.0 | Divorced | 1.0 | 1 | 2 | 1 | 0.0 | Manager | 19537.0 |
| 661 | 0 | 43.0 | Self Enquiry | 1 | NaN | Salaried | Female | 2 | 3.0 | Deluxe | 4.0 | Married | 5.0 | 0 | 3 | 1 | 0.0 | Manager | 19765.0 |
| 680 | 0 | 36.0 | Self Enquiry | 1 | NaN | Salaried | Male | 3 | 3.0 | Basic | 3.0 | Single | 3.0 | 0 | 3 | 0 | 2.0 | Executive | 19678.0 |
| 685 | 0 | 26.0 | Company Invited | 3 | NaN | Small Business | Male | 2 | 4.0 | Deluxe | 5.0 | Single | 2.0 | 0 | 3 | 1 | 1.0 | Manager | NaN |
| 686 | 0 | 41.0 | Self Enquiry | 1 | NaN | Small Business | Male | 2 | 3.0 | Basic | 5.0 | Single | 3.0 | 1 | 4 | 1 | 0.0 | Executive | 19721.0 |
| 696 | 0 | 45.0 | Company Invited | 1 | NaN | Salaried | Male | 2 | 3.0 | Deluxe | 4.0 | Divorced | 2.0 | 0 | 4 | 1 | 1.0 | Manager | NaN |
| 698 | 0 | 35.0 | Self Enquiry | 3 | NaN | Small Business | Female | 2 | 3.0 | Deluxe | 3.0 | Divorced | 2.0 | 0 | 4 | 1 | 0.0 | Manager | 19601.0 |
| 724 | 0 | 24.0 | NaN | 1 | NaN | Small Business | Female | 2 | 4.0 | Deluxe | 3.0 | Married | 2.0 | 0 | 3 | 1 | 1.0 | Manager | NaN |
| 735 | 0 | 48.0 | Self Enquiry | 1 | NaN | Salaried | Male | 3 | 4.0 | Deluxe | 3.0 | Single | 3.0 | 0 | 2 | 0 | 0.0 | Manager | NaN |
| 744 | 1 | 37.0 | Self Enquiry | 1 | NaN | Small Business | Female | 3 | 5.0 | Deluxe | 4.0 | Divorced | 6.0 | 0 | 4 | 1 | 0.0 | Manager | 19777.0 |
| 761 | 1 | 36.0 | Self Enquiry | 1 | NaN | Salaried | Male | 2 | 3.0 | Deluxe | 3.0 | Married | 1.0 | 0 | 4 | 0 | 0.0 | Manager | 19834.0 |
| 778 | 0 | 46.0 | Self Enquiry | 1 | NaN | Salaried | Female | 3 | 4.0 | Deluxe | 5.0 | Married | 1.0 | 1 | 2 | 1 | 2.0 | Manager | 19615.0 |
| 786 | 0 | 27.0 | Company Invited | 1 | NaN | Salaried | Male | 2 | 5.0 | Basic | 3.0 | Divorced | 2.0 | 0 | 2 | 1 | 1.0 | Executive | 19621.0 |
| 792 | 1 | 33.0 | Company Invited | 1 | NaN | Small Business | Female | 2 | 4.0 | Deluxe | 3.0 | Single | 2.0 | 0 | 5 | 1 | 0.0 | Manager | 19508.0 |
| 801 | 1 | 50.0 | Company Invited | 3 | NaN | Salaried | Male | 2 | 3.0 | Deluxe | 3.0 | Single | 4.0 | 1 | 2 | 0 | 0.0 | Manager | 19728.0 |
| 802 | 0 | 33.0 | Company Invited | 3 | NaN | Salaried | Female | 1 | 3.0 | Deluxe | 4.0 | Divorced | 1.0 | 0 | 4 | 1 | 0.0 | Manager | NaN |
| 824 | 0 | 42.0 | Self Enquiry | 1 | NaN | Small Business | Male | 2 | 5.0 | Deluxe | 3.0 | Single | 5.0 | 0 | 3 | 1 | 0.0 | Manager | NaN |
| 830 | 0 | 41.0 | Self Enquiry | 1 | NaN | Salaried | Male | 2 | 3.0 | Basic | 3.0 | Divorced | 4.0 | 1 | 3 | 1 | 0.0 | Executive | 19766.0 |
| 835 | 0 | 35.0 | Self Enquiry | 2 | NaN | Large Business | Male | 3 | 3.0 | Basic | 3.0 | Single | 2.0 | 0 | 5 | 0 | 1.0 | Executive | NaN |
| 843 | 0 | 26.0 | NaN | 1 | NaN | Small Business | Male | 2 | 1.0 | Basic | 3.0 | Divorced | 2.0 | 0 | 5 | 1 | 1.0 | Executive | NaN |
| 845 | 0 | 40.0 | Company Invited | 1 | NaN | Small Business | Female | 3 | 4.0 | Deluxe | 3.0 | Divorced | 4.0 | 1 | 2 | 1 | 0.0 | Manager | NaN |
| 854 | 0 | 45.0 | Self Enquiry | 1 | NaN | Small Business | Female | 2 | 3.0 | Basic | 3.0 | Divorced | 5.0 | 1 | 3 | 1 | 0.0 | Executive | NaN |
| 866 | 0 | 40.0 | Company Invited | 3 | NaN | Small Business | Male | 3 | 3.0 | Deluxe | 4.0 | Divorced | 6.0 | 0 | 4 | 1 | 2.0 | Manager | NaN |
| 872 | 0 | 33.0 | Company Invited | 3 | NaN | Small Business | Female | 2 | 3.0 | Deluxe | 3.0 | Divorced | 2.0 | 0 | 3 | 0 | 1.0 | Manager | 19539.0 |
| 875 | 0 | 44.0 | Self Enquiry | 1 | NaN | Salaried | Male | 2 | 3.0 | Deluxe | 3.0 | Single | 2.0 | 0 | 5 | 1 | 1.0 | Manager | 19541.0 |
| 917 | 0 | 34.0 | Self Enquiry | 3 | NaN | Small Business | Female | 2 | 3.0 | Deluxe | 5.0 | Single | 1.0 | 1 | 3 | 1 | 0.0 | Manager | 19538.0 |
| 920 | 0 | 34.0 | Company Invited | 1 | NaN | Small Business | Female | 2 | 3.0 | Deluxe | 4.0 | Married | 5.0 | 0 | 3 | 1 | 0.0 | Manager | NaN |
| 923 | 0 | 34.0 | Company Invited | 2 | NaN | Salaried | Male | 2 | 4.0 | Deluxe | 4.0 | Divorced | 5.0 | 0 | 5 | 0 | 1.0 | Manager | 19490.0 |
| 931 | 0 | 30.0 | Company Invited | 1 | NaN | Small Business | Female | 3 | 3.0 | Deluxe | 3.0 | Single | 1.0 | 1 | 3 | 1 | 2.0 | Manager | 19695.0 |
| 939 | 1 | 32.0 | Self Enquiry | 1 | NaN | Salaried | Male | 3 | 3.0 | Deluxe | 3.0 | Divorced | 2.0 | 0 | 3 | 1 | 1.0 | Manager | 19883.0 |
| 941 | 0 | 30.0 | Self Enquiry | 1 | NaN | Large Business | Female | 2 | 4.0 | Deluxe | 3.0 | Divorced | 1.0 | 0 | 4 | 1 | 1.0 | Manager | 19627.0 |
| 949 | 0 | 39.0 | Self Enquiry | 1 | NaN | Salaried | Male | 2 | 3.0 | Deluxe | 3.0 | Single | 1.0 | 0 | 3 | 0 | 0.0 | Manager | 19534.0 |
| 959 | 0 | 40.0 | Self Enquiry | 1 | NaN | Salaried | Male | 3 | 3.0 | Deluxe | 3.0 | Single | 2.0 | 0 | 3 | 1 | 2.0 | Manager | 19661.0 |
| 961 | 0 | 35.0 | Company Invited | 1 | NaN | Salaried | Male | 2 | 3.0 | Deluxe | 3.0 | Single | 2.0 | 1 | 1 | 1 | 1.0 | Manager | NaN |
| 968 | 0 | 36.0 | Company Invited | 3 | NaN | Small Business | Female | 2 | 1.0 | Deluxe | 5.0 | Divorced | 3.0 | 0 | 1 | 1 | 0.0 | Manager | 19639.0 |
| 981 | 1 | 35.0 | Company Invited | 3 | NaN | Small Business | Female | 3 | 3.0 | Deluxe | 3.0 | Divorced | 1.0 | 1 | 5 | 1 | 2.0 | Manager | 19614.0 |
| 984 | 0 | 28.0 | Self Enquiry | 3 | NaN | Salaried | Male | 2 | 4.0 | Deluxe | 5.0 | Divorced | 2.0 | 0 | 4 | 1 | 0.0 | Manager | 19724.0 |
| 1006 | 1 | 49.0 | Company Invited | 1 | NaN | Salaried | Male | 3 | 4.0 | Deluxe | 5.0 | Single | 4.0 | 0 | 3 | 0 | 1.0 | Manager | NaN |
| 1013 | 0 | 30.0 | Self Enquiry | 3 | NaN | Small Business | Female | 3 | 3.0 | Deluxe | 5.0 | Married | 1.0 | 0 | 3 | 1 | 0.0 | Manager | 19779.0 |
| 1021 | 1 | 25.0 | NaN | 3 | NaN | Salaried | Male | 3 | 4.0 | Basic | 5.0 | Divorced | 4.0 | 0 | 1 | 1 | 0.0 | Executive | NaN |
| 1047 | 0 | 33.0 | NaN | 3 | NaN | Small Business | Male | 2 | 3.0 | Deluxe | 5.0 | Divorced | 1.0 | 0 | 3 | 0 | 0.0 | Manager | NaN |
| 1048 | 0 | 34.0 | Self Enquiry | 3 | NaN | Salaried | Male | 2 | 5.0 | Deluxe | 5.0 | Single | 4.0 | 0 | 5 | 1 | 1.0 | Manager | 19759.0 |
| 1051 | 0 | 44.0 | Company Invited | 3 | NaN | Small Business | Female | 3 | 3.0 | Deluxe | 3.0 | Divorced | 1.0 | 0 | 4 | 1 | 1.0 | Manager | 19768.0 |
| 1058 | 1 | 34.0 | Self Enquiry | 3 | NaN | Small Business | Female | 2 | 4.0 | Deluxe | 4.0 | Single | 1.0 | 1 | 5 | 1 | 0.0 | Manager | 19599.0 |
| 1067 | 0 | 47.0 | Self Enquiry | 3 | NaN | Small Business | Female | 3 | 3.0 | Deluxe | 3.0 | Divorced | 4.0 | 1 | 5 | 1 | 0.0 | Manager | 19591.0 |
| 1070 | 0 | 28.0 | Company Invited | 3 | NaN | Salaried | Male | 2 | 3.0 | Deluxe | 5.0 | Single | 1.0 | 0 | 3 | 1 | 1.0 | Manager | 19898.0 |
| 1071 | 0 | 49.0 | Self Enquiry | 1 | NaN | Small Business | Female | 2 | 4.0 | Deluxe | 5.0 | Divorced | 5.0 | 0 | 3 | 1 | 1.0 | Manager | 19789.0 |
| 1088 | 0 | 42.0 | Self Enquiry | 1 | NaN | Small Business | Male | 3 | 4.0 | Basic | 4.0 | Married | 5.0 | 0 | 1 | 1 | 2.0 | Executive | 19841.0 |
| 1089 | 0 | 37.0 | Self Enquiry | 1 | NaN | Small Business | Male | 2 | 3.0 | Deluxe | 3.0 | Married | 2.0 | 0 | 1 | 0 | 0.0 | Manager | NaN |
| 1091 | 0 | 33.0 | Self Enquiry | 1 | NaN | Salaried | Male | 2 | 4.0 | Deluxe | 4.0 | Single | 2.0 | 0 | 1 | 0 | 1.0 | Manager | NaN |
| 1112 | 1 | 38.0 | Self Enquiry | 1 | NaN | Small Business | Male | 2 | 3.0 | Deluxe | 4.0 | Married | 5.0 | 0 | 4 | 0 | 0.0 | Manager | 19855.0 |
| 1122 | 0 | 29.0 | Self Enquiry | 1 | NaN | Small Business | Male | 2 | 3.0 | Basic | 5.0 | Single | 2.0 | 1 | 3 | 0 | 1.0 | Executive | 19723.0 |
| 1132 | 0 | 40.0 | Self Enquiry | 3 | NaN | Salaried | Female | 2 | 3.0 | Deluxe | 5.0 | Married | 2.0 | 0 | 3 | 0 | 1.0 | Manager | 19639.0 |
| 1133 | 0 | 43.0 | Self Enquiry | 1 | NaN | Large Business | Male | 2 | 1.0 | Basic | 4.0 | Married | 6.0 | 0 | 5 | 1 | 1.0 | Executive | 19876.0 |
| 1143 | 0 | 45.0 | NaN | 3 | NaN | Small Business | Male | 2 | 4.0 | Deluxe | 5.0 | Married | 2.0 | 0 | 3 | 0 | 0.0 | Manager | NaN |
| 1145 | 0 | 36.0 | Self Enquiry | 1 | NaN | Salaried | Female | 3 | 3.0 | Deluxe | 3.0 | Married | 1.0 | 1 | 1 | 0 | 2.0 | Manager | 19663.0 |
| 1146 | 0 | 34.0 | Company Invited | 1 | NaN | Salaried | Male | 2 | 1.0 | Deluxe | 3.0 | Married | 2.0 | 0 | 4 | 1 | 0.0 | Manager | 19724.0 |
| 1182 | 0 | 36.0 | NaN | 1 | NaN | Small Business | Female | 2 | 4.0 | Deluxe | 3.0 | Married | 1.0 | 0 | 5 | 1 | 1.0 | Manager | NaN |
| 1186 | 1 | 35.0 | Company Invited | 3 | NaN | Salaried | Male | 2 | 3.0 | Deluxe | 3.0 | Single | 4.0 | 1 | 3 | 1 | 0.0 | Manager | 19581.0 |
| 1187 | 0 | 38.0 | Company Invited | 1 | NaN | Salaried | Male | 2 | 3.0 | Deluxe | 3.0 | Married | 1.0 | 0 | 5 | 1 | 0.0 | Manager | 19735.0 |
| 1217 | 0 | 24.0 | NaN | 1 | NaN | Small Business | Male | 3 | 1.0 | Basic | 3.0 | Married | 2.0 | 0 | 1 | 0 | 0.0 | Executive | NaN |
| 1220 | 0 | 36.0 | Self Enquiry | 3 | NaN | Salaried | Female | 2 | 3.0 | Deluxe | 3.0 | Single | 4.0 | 0 | 4 | 1 | 1.0 | Manager | 19502.0 |
| 1254 | 0 | 49.0 | Self Enquiry | 3 | NaN | Small Business | Female | 3 | 3.0 | Deluxe | 3.0 | Single | 4.0 | 0 | 4 | 0 | 1.0 | Manager | 19507.0 |
| 1304 | 0 | 40.0 | Self Enquiry | 1 | NaN | Salaried | Female | 2 | 3.0 | Deluxe | 5.0 | Married | 3.0 | 0 | 1 | 1 | 1.0 | Manager | NaN |
| 1309 | 0 | 26.0 | Self Enquiry | 3 | NaN | Small Business | Male | 2 | 3.0 | Deluxe | 3.0 | Single | 2.0 | 0 | 1 | 0 | 0.0 | Manager | 19684.0 |
| 1319 | 0 | 32.0 | Company Invited | 3 | NaN | Small Business | Male | 3 | 3.0 | Deluxe | 3.0 | Single | 1.0 | 0 | 3 | 1 | 1.0 | Manager | 19648.0 |
| 1334 | 0 | 27.0 | Company Invited | 1 | NaN | Salaried | Female | 3 | 4.0 | Basic | 5.0 | Married | 1.0 | 0 | 4 | 1 | 0.0 | Executive | 19774.0 |
| 1344 | 0 | 37.0 | Self Enquiry | 1 | NaN | Small Business | Male | 3 | 3.0 | Deluxe | 5.0 | Married | 6.0 | 1 | 3 | 0 | 0.0 | Manager | NaN |
| 1345 | 0 | 35.0 | Self Enquiry | 1 | NaN | Salaried | Female | 2 | 4.0 | Deluxe | 4.0 | Married | 1.0 | 1 | 1 | 1 | 1.0 | Manager | 19788.0 |
| 1356 | 0 | 41.0 | NaN | 3 | NaN | Small Business | Female | 2 | 3.0 | Deluxe | 4.0 | Married | 6.0 | 0 | 3 | 1 | 1.0 | Manager | NaN |
| 1376 | 0 | 38.0 | Self Enquiry | 1 | NaN | Salaried | Male | 3 | 3.0 | Basic | 3.0 | Married | 3.0 | 0 | 1 | 1 | 1.0 | Executive | 19771.0 |
| 1404 | 0 | 42.0 | Company Invited | 1 | NaN | Salaried | Male | 2 | 4.0 | Deluxe | 3.0 | Single | 2.0 | 0 | 3 | 1 | 1.0 | Manager | NaN |
| 1406 | 0 | 54.0 | Self Enquiry | 1 | NaN | Small Business | Female | 3 | 3.0 | Deluxe | 5.0 | Single | 7.0 | 1 | 1 | 0 | 0.0 | Manager | NaN |
| 1407 | 0 | 24.0 | Self Enquiry | 1 | NaN | Salaried | Male | 2 | 4.0 | Deluxe | 3.0 | Single | 2.0 | 0 | 3 | 1 | 1.0 | Manager | 19617.0 |
| 1416 | 0 | 38.0 | Self Enquiry | 3 | NaN | Salaried | Male | 2 | 3.0 | Deluxe | 4.0 | Married | 1.0 | 0 | 4 | 1 | 1.0 | Manager | NaN |
| 1425 | 0 | 33.0 | Self Enquiry | 1 | NaN | Small Business | Female | 3 | 3.0 | Deluxe | 3.0 | Married | 1.0 | 1 | 1 | 1 | 1.0 | Manager | 19878.0 |
| 1442 | 1 | 29.0 | Self Enquiry | 1 | NaN | Small Business | Male | 2 | 3.0 | Basic | 3.0 | Married | 1.0 | 1 | 3 | 1 | 0.0 | Executive | 19787.0 |
| 1454 | 0 | 45.0 | Self Enquiry | 3 | NaN | Salaried | Female | 2 | 3.0 | Deluxe | 3.0 | Single | 1.0 | 0 | 1 | 1 | 0.0 | Manager | 19850.0 |
| 1469 | 0 | 34.0 | NaN | 1 | NaN | Small Business | Male | 2 | 1.0 | Deluxe | 3.0 | Married | 3.0 | 0 | 3 | 0 | 1.0 | Manager | NaN |
| 1516 | 0 | 34.0 | Company Invited | 3 | NaN | Small Business | Male | 3 | 3.0 | Deluxe | 3.0 | Single | 1.0 | 0 | 5 | 0 | 1.0 | Manager | 19568.0 |
| 1545 | 0 | 31.0 | Self Enquiry | 1 | NaN | Salaried | Female | 3 | 3.0 | Deluxe | 3.0 | Single | 2.0 | 0 | 5 | 1 | 2.0 | Manager | NaN |
| 1546 | 0 | 35.0 | Self Enquiry | 3 | NaN | Small Business | Male | 2 | 4.0 | Deluxe | 5.0 | Single | 1.0 | 0 | 1 | 1 | 1.0 | Manager | NaN |
| 1554 | 0 | 34.0 | Self Enquiry | 1 | NaN | Small Business | Male | 3 | 3.0 | Deluxe | 4.0 | Married | 2.0 | 0 | 5 | 1 | 2.0 | Manager | NaN |
| 1573 | 0 | 34.0 | Self Enquiry | 1 | NaN | Salaried | Female | 2 | 3.0 | Deluxe | 3.0 | Single | 2.0 | 0 | 3 | 0 | 0.0 | Manager | 19809.0 |
| 1584 | 0 | 34.0 | Self Enquiry | 1 | NaN | Salaried | Female | 2 | 4.0 | Deluxe | 4.0 | Married | 7.0 | 0 | 3 | 1 | 0.0 | Manager | 19505.0 |
| 1600 | 0 | 43.0 | Company Invited | 1 | NaN | Small Business | Female | 3 | 1.0 | Basic | 3.0 | Single | 5.0 | 0 | 3 | 0 | 2.0 | Executive | 19739.0 |
| 1602 | 1 | 31.0 | Self Enquiry | 3 | NaN | Salaried | Female | 3 | 5.0 | Deluxe | 3.0 | Married | 4.0 | 1 | 3 | 0 | 2.0 | Manager | 19559.0 |
| 1612 | 0 | 38.0 | Self Enquiry | 1 | NaN | Large Business | Female | 2 | 3.0 | Deluxe | 3.0 | Single | 4.0 | 1 | 5 | 1 | 0.0 | Manager | NaN |
| 1614 | 0 | 32.0 | Company Invited | 3 | NaN | Small Business | Male | 2 | 5.0 | Deluxe | 3.0 | Married | 1.0 | 0 | 1 | 0 | 1.0 | Manager | 19668.0 |
| 1625 | 0 | 29.0 | Company Invited | 1 | NaN | Large Business | Male | 2 | 3.0 | Deluxe | 3.0 | Married | 2.0 | 0 | 3 | 0 | 0.0 | Manager | NaN |
| 1645 | 0 | 56.0 | Self Enquiry | 1 | NaN | Salaried | Female | 3 | 3.0 | Basic | 5.0 | Married | 5.0 | 1 | 4 | 1 | 1.0 | Executive | NaN |
| 1654 | 0 | 53.0 | Self Enquiry | 1 | NaN | Small Business | Female | 2 | 1.0 | Deluxe | 5.0 | Married | 2.0 | 0 | 1 | 1 | 0.0 | Manager | NaN |
| 1666 | 0 | 35.0 | Company Invited | 1 | NaN | Small Business | Female | 2 | 3.0 | Deluxe | 4.0 | Single | 6.0 | 0 | 1 | 0 | 0.0 | Manager | NaN |
| 1670 | 0 | 27.0 | Company Invited | 1 | NaN | Large Business | Male | 2 | 4.0 | Deluxe | 5.0 | Married | 6.0 | 0 | 3 | 1 | 0.0 | Manager | NaN |
| 1678 | 0 | 40.0 | Company Invited | 1 | NaN | Salaried | Male | 3 | 4.0 | Deluxe | 3.0 | Married | 1.0 | 0 | 5 | 0 | 1.0 | Manager | 19876.0 |
| 1694 | 0 | 31.0 | NaN | 1 | NaN | Small Business | Male | 2 | 5.0 | Deluxe | 3.0 | Married | 1.0 | 0 | 3 | 0 | 0.0 | Manager | NaN |
| 1711 | 0 | 32.0 | Company Invited | 3 | NaN | Small Business | Male | 3 | 3.0 | Deluxe | 3.0 | Married | 1.0 | 0 | 4 | 1 | 1.0 | Manager | NaN |
| 1725 | 0 | 25.0 | Self Enquiry | 1 | NaN | Salaried | Female | 3 | 3.0 | Deluxe | 3.0 | Married | 1.0 | 0 | 3 | 1 | 1.0 | Manager | 19898.0 |
| 1752 | 0 | 29.0 | Company Invited | 3 | NaN | Salaried | Male | 2 | 3.0 | Deluxe | 3.0 | Single | 2.0 | 0 | 1 | 0 | 1.0 | Manager | 19554.0 |
| 1754 | 0 | 26.0 | Company Invited | 1 | NaN | Small Business | Male | 2 | 3.0 | Deluxe | 5.0 | Married | 2.0 | 1 | 3 | 1 | 0.0 | Manager | 19741.0 |
| 1761 | 0 | 36.0 | Self Enquiry | 1 | NaN | Large Business | Male | 1 | 3.0 | Deluxe | 4.0 | Single | 5.0 | 0 | 1 | 1 | 0.0 | Manager | 19485.0 |
| 1779 | 0 | 31.0 | Self Enquiry | 1 | NaN | Large Business | Male | 2 | 3.0 | Basic | 3.0 | Married | 1.0 | 1 | 4 | 1 | 0.0 | Executive | 19821.0 |
| 1790 | 0 | 27.0 | Self Enquiry | 3 | NaN | Salaried | Male | 3 | 3.0 | Deluxe | 3.0 | Single | 2.0 | 1 | 4 | 0 | 1.0 | Manager | NaN |
| 1798 | 0 | 33.0 | Company Invited | 3 | NaN | Small Business | Male | 2 | 4.0 | Deluxe | 3.0 | Single | 4.0 | 0 | 3 | 1 | 0.0 | Manager | 19682.0 |
| 1802 | 0 | 54.0 | Company Invited | 1 | NaN | Salaried | Female | 2 | 3.0 | Deluxe | 3.0 | Single | 4.0 | 0 | 4 | 0 | 1.0 | Manager | 19869.0 |
| 1819 | 0 | 29.0 | Company Invited | 3 | NaN | Salaried | Male | 3 | 3.0 | Deluxe | 3.0 | Married | 2.0 | 0 | 1 | 1 | 1.0 | Manager | 19649.0 |
| 1824 | 0 | 30.0 | Company Invited | 3 | NaN | Large Business | Female | 2 | 3.0 | Deluxe | 3.0 | Married | 1.0 | 1 | 3 | 1 | 1.0 | Manager | 19736.0 |
| 1850 | 0 | 24.0 | Self Enquiry | 3 | NaN | Small Business | Female | 3 | 3.0 | Deluxe | 3.0 | Married | 2.0 | 1 | 1 | 0 | 0.0 | Manager | NaN |
| 1864 | 0 | 31.0 | Self Enquiry | 1 | NaN | Small Business | Female | 2 | 3.0 | Deluxe | 5.0 | Married | 2.0 | 0 | 1 | 1 | 0.0 | Manager | NaN |
| 1866 | 0 | 43.0 | Self Enquiry | 1 | NaN | Salaried | Female | 3 | 3.0 | Deluxe | 3.0 | Married | 5.0 | 1 | 1 | 1 | 0.0 | Manager | 19522.0 |
| 1867 | 0 | 25.0 | Self Enquiry | 3 | NaN | Salaried | Female | 3 | 4.0 | Deluxe | 3.0 | Single | 2.0 | 1 | 1 | 0 | 2.0 | Manager | 19487.0 |
| 1868 | 0 | 37.0 | Company Invited | 1 | NaN | Small Business | Female | 3 | 3.0 | Deluxe | 3.0 | Married | 4.0 | 1 | 3 | 1 | 2.0 | Manager | NaN |
| 1874 | 0 | 28.0 | Self Enquiry | 1 | NaN | Small Business | Male | 3 | 3.0 | Deluxe | 5.0 | Married | 2.0 | 0 | 1 | 0 | 1.0 | Manager | 19558.0 |
| 1879 | 0 | 42.0 | Company Invited | 1 | NaN | Salaried | Female | 3 | 3.0 | Deluxe | 3.0 | Married | 3.0 | 0 | 3 | 1 | 1.0 | Manager | 19556.0 |
| 1882 | 0 | 46.0 | Self Enquiry | 1 | NaN | Small Business | Female | 2 | 3.0 | Deluxe | 3.0 | Married | 3.0 | 0 | 5 | 1 | 1.0 | Manager | 19810.0 |
| 1883 | 0 | 42.0 | Company Invited | 1 | NaN | Large Business | Female | 2 | 4.0 | Deluxe | 3.0 | Married | 1.0 | 0 | 3 | 0 | 1.0 | Manager | 19523.0 |
| 1917 | 0 | 35.0 | Self Enquiry | 3 | NaN | Small Business | Male | 2 | 3.0 | Deluxe | 3.0 | Married | 1.0 | 0 | 3 | 0 | 1.0 | Manager | 19717.0 |
| 1922 | 0 | 45.0 | Self Enquiry | 3 | NaN | Salaried | Male | 3 | 3.0 | Deluxe | 4.0 | Married | 1.0 | 0 | 5 | 1 | 2.0 | Manager | 19805.0 |
| 1924 | 0 | 29.0 | Self Enquiry | 1 | NaN | Large Business | Male | 2 | 3.0 | Deluxe | 3.0 | Married | 5.0 | 0 | 5 | 1 | 0.0 | Manager | NaN |
| 1930 | 0 | 26.0 | Self Enquiry | 3 | NaN | Small Business | Male | 2 | 3.0 | Deluxe | 3.0 | Married | 6.0 | 0 | 5 | 0 | 0.0 | Manager | NaN |
| 1931 | 0 | 35.0 | Self Enquiry | 3 | NaN | Small Business | Female | 3 | 3.0 | Deluxe | 3.0 | Single | 2.0 | 0 | 4 | 1 | 0.0 | Manager | 19859.0 |
| 1939 | 1 | 32.0 | Company Invited | 3 | NaN | Salaried | Male | 2 | 1.0 | Deluxe | 3.0 | Married | 1.0 | 0 | 3 | 1 | 0.0 | Manager | 19707.0 |
| 1952 | 1 | 31.0 | Self Enquiry | 3 | NaN | Small Business | Male | 2 | 3.0 | Deluxe | 5.0 | Married | 3.0 | 0 | 1 | 0 | 1.0 | Manager | NaN |
| 1974 | 1 | 45.0 | Company Invited | 3 | NaN | Salaried | Female | 3 | 3.0 | Deluxe | 5.0 | Married | 3.0 | 0 | 1 | 0 | 2.0 | Manager | NaN |
| 1987 | 0 | 25.0 | Self Enquiry | 3 | NaN | Salaried | Male | 2 | 1.0 | Deluxe | 4.0 | Married | 1.0 | 0 | 4 | 1 | 0.0 | Manager | 19851.0 |
| 1991 | 0 | 27.0 | Company Invited | 3 | NaN | Small Business | Female | 3 | 1.0 | Deluxe | 3.0 | Married | 2.0 | 1 | 1 | 1 | 0.0 | Manager | 19647.0 |
| 1992 | 0 | 37.0 | Self Enquiry | 1 | NaN | Salaried | Male | 3 | 1.0 | Basic | 3.0 | Single | 4.0 | 0 | 4 | 1 | 2.0 | Executive | 19680.0 |
| 1995 | 1 | 24.0 | Self Enquiry | 3 | NaN | Salaried | Female | 3 | 3.0 | Deluxe | 3.0 | Single | 1.0 | 0 | 1 | 1 | 1.0 | Manager | 19577.0 |
| 1996 | 0 | 39.0 | Self Enquiry | 1 | NaN | Large Business | Female | 3 | 4.0 | Deluxe | 3.0 | Single | 2.0 | 0 | 1 | 1 | 1.0 | Manager | 19553.0 |
| 2040 | 0 | 52.0 | Company Invited | 1 | NaN | Small Business | Male | 2 | 3.0 | Basic | 3.0 | Married | 1.0 | 0 | 3 | 1 | 0.0 | Executive | NaN |
| 2041 | 0 | 26.0 | NaN | 1 | NaN | Salaried | Female | 3 | 5.0 | Basic | 3.0 | Married | 4.0 | 0 | 4 | 1 | 0.0 | Executive | NaN |
| 2042 | 0 | 29.0 | NaN | 1 | NaN | Small Business | Female | 3 | 3.0 | Deluxe | 3.0 | Married | 5.0 | 0 | 1 | 0 | 1.0 | Manager | NaN |
| 2046 | 0 | 27.0 | NaN | 3 | NaN | Small Business | Male | 2 | 3.0 | Deluxe | 3.0 | Married | 1.0 | 0 | 3 | 1 | 1.0 | Manager | NaN |
| 2049 | 0 | 34.0 | NaN | 1 | NaN | Small Business | Female | 2 | 4.0 | Basic | 5.0 | Single | 2.0 | 0 | 1 | 1 | 0.0 | Executive | NaN |
| 2052 | 0 | 40.0 | Company Invited | 1 | NaN | Small Business | Female | 2 | 1.0 | Deluxe | 4.0 | Married | 2.0 | 0 | 3 | 0 | 0.0 | Manager | NaN |
| 2068 | 1 | 28.0 | NaN | 1 | NaN | Small Business | Male | 2 | 3.0 | Basic | 3.0 | Single | 7.0 | 0 | 3 | 1 | 1.0 | Executive | NaN |
| 2074 | 0 | 42.0 | Self Enquiry | 1 | NaN | Salaried | Male | 3 | 3.0 | Deluxe | 4.0 | Married | 2.0 | 0 | 3 | 1 | 0.0 | Manager | NaN |
| 2082 | 0 | 28.0 | Self Enquiry | 3 | NaN | Small Business | Female | 2 | 3.0 | Deluxe | 4.0 | Married | 2.0 | 1 | 3 | 1 | 1.0 | Manager | 19779.0 |
| 2092 | 0 | 32.0 | NaN | 3 | NaN | Salaried | Male | 3 | 3.0 | Deluxe | 3.0 | Married | 3.0 | 0 | 1 | 0 | 2.0 | Manager | NaN |
| 2100 | 0 | 22.0 | Self Enquiry | 1 | NaN | Salaried | Male | 2 | 4.0 | Deluxe | 3.0 | Married | 7.0 | 0 | 1 | 1 | 1.0 | Manager | 19775.0 |
| 2108 | 0 | 25.0 | Self Enquiry | 3 | NaN | Small Business | Male | 2 | 4.0 | Deluxe | 5.0 | Married | 2.0 | 0 | 3 | 1 | 1.0 | Manager | NaN |
| 2121 | 0 | 47.0 | Self Enquiry | 3 | NaN | Small Business | Female | 2 | 3.0 | Deluxe | 3.0 | Married | 1.0 | 1 | 1 | 1 | 1.0 | Manager | 19537.0 |
| 2131 | 0 | 43.0 | Self Enquiry | 1 | NaN | Salaried | Female | 2 | 3.0 | Deluxe | 4.0 | Married | 5.0 | 0 | 3 | 1 | 0.0 | Manager | 19765.0 |
| 2150 | 0 | 36.0 | Self Enquiry | 1 | NaN | Salaried | Male | 3 | 3.0 | Basic | 3.0 | Single | 3.0 | 0 | 3 | 1 | 2.0 | Executive | 19678.0 |
| 2155 | 0 | 26.0 | Company Invited | 3 | NaN | Small Business | Male | 2 | 4.0 | Deluxe | 5.0 | Single | 2.0 | 0 | 3 | 0 | 0.0 | Manager | NaN |
| 2156 | 0 | 41.0 | Self Enquiry | 1 | NaN | Small Business | Male | 2 | 3.0 | Basic | 5.0 | Single | 3.0 | 1 | 4 | 1 | 1.0 | Executive | 19721.0 |
| 2166 | 0 | 45.0 | Company Invited | 1 | NaN | Salaried | Male | 2 | 3.0 | Deluxe | 4.0 | Married | 2.0 | 0 | 4 | 1 | 0.0 | Manager | NaN |
| 2168 | 0 | 35.0 | Self Enquiry | 3 | NaN | Small Business | Female | 2 | 3.0 | Deluxe | 3.0 | Married | 2.0 | 0 | 4 | 1 | 1.0 | Manager | 19601.0 |
| 2194 | 0 | 24.0 | NaN | 1 | NaN | Small Business | Female | 2 | 4.0 | Deluxe | 3.0 | Married | 2.0 | 0 | 3 | 0 | 0.0 | Manager | NaN |
| 2205 | 0 | 48.0 | Self Enquiry | 1 | NaN | Salaried | Male | 3 | 4.0 | Deluxe | 3.0 | Single | 3.0 | 0 | 1 | 0 | 2.0 | Manager | NaN |
| 2214 | 1 | 37.0 | Self Enquiry | 1 | NaN | Small Business | Female | 3 | 5.0 | Deluxe | 4.0 | Married | 6.0 | 0 | 4 | 1 | 0.0 | Manager | 19777.0 |
| 2231 | 1 | 36.0 | Self Enquiry | 1 | NaN | Salaried | Male | 2 | 3.0 | Deluxe | 3.0 | Married | 1.0 | 0 | 4 | 0 | 1.0 | Manager | 19834.0 |
| 2248 | 0 | 46.0 | Self Enquiry | 1 | NaN | Salaried | Female | 3 | 4.0 | Deluxe | 5.0 | Married | 1.0 | 1 | 1 | 1 | 2.0 | Manager | 19615.0 |
| 2256 | 0 | 27.0 | Company Invited | 1 | NaN | Salaried | Male | 2 | 5.0 | Basic | 3.0 | Married | 2.0 | 0 | 1 | 1 | 1.0 | Executive | 19621.0 |
| 2262 | 1 | 33.0 | Company Invited | 1 | NaN | Small Business | Female | 2 | 4.0 | Deluxe | 3.0 | Single | 2.0 | 0 | 5 | 1 | 0.0 | Manager | 19508.0 |
| 2271 | 1 | 50.0 | Company Invited | 3 | NaN | Salaried | Male | 2 | 3.0 | Deluxe | 3.0 | Single | 4.0 | 1 | 1 | 1 | 1.0 | Manager | 19728.0 |
| 2272 | 0 | 33.0 | Company Invited | 3 | NaN | Salaried | Female | 1 | 3.0 | Deluxe | 4.0 | Married | 1.0 | 0 | 4 | 0 | 0.0 | Manager | NaN |
| 2294 | 0 | 42.0 | Self Enquiry | 1 | NaN | Small Business | Male | 2 | 5.0 | Deluxe | 3.0 | Single | 5.0 | 0 | 3 | 1 | 1.0 | Manager | NaN |
| 2300 | 0 | 41.0 | Self Enquiry | 1 | NaN | Salaried | Male | 2 | 3.0 | Basic | 3.0 | Married | 4.0 | 1 | 3 | 1 | 0.0 | Executive | 19766.0 |
| 2305 | 0 | 35.0 | Self Enquiry | 2 | NaN | Large Business | Male | 3 | 3.0 | Basic | 3.0 | Single | 2.0 | 0 | 5 | 0 | 1.0 | Executive | NaN |
| 2313 | 0 | 26.0 | NaN | 1 | NaN | Small Business | Male | 2 | 1.0 | Basic | 3.0 | Married | 2.0 | 0 | 5 | 1 | 1.0 | Executive | NaN |
| 2315 | 0 | 40.0 | Company Invited | 1 | NaN | Small Business | Female | 3 | 4.0 | Deluxe | 3.0 | Married | 4.0 | 1 | 1 | 0 | 1.0 | Manager | NaN |
| 2324 | 0 | 45.0 | Self Enquiry | 1 | NaN | Small Business | Female | 2 | 3.0 | Basic | 3.0 | Married | 5.0 | 1 | 3 | 0 | 1.0 | Executive | NaN |
| 2336 | 0 | 40.0 | Company Invited | 3 | NaN | Small Business | Male | 3 | 3.0 | Deluxe | 4.0 | Married | 6.0 | 0 | 4 | 1 | 1.0 | Manager | NaN |
| 2342 | 0 | 33.0 | Company Invited | 3 | NaN | Small Business | Female | 2 | 3.0 | Deluxe | 3.0 | Married | 2.0 | 0 | 3 | 0 | 1.0 | Manager | 19539.0 |
| 2345 | 0 | 44.0 | Self Enquiry | 1 | NaN | Salaried | Male | 2 | 3.0 | Deluxe | 3.0 | Single | 2.0 | 0 | 5 | 1 | 1.0 | Manager | 19541.0 |
| 2387 | 1 | 34.0 | Self Enquiry | 1 | NaN | Salaried | Female | 2 | 3.0 | Basic | 3.0 | Single | 2.0 | 1 | 5 | 1 | 0.0 | Executive | 19538.0 |
| 2390 | 1 | 34.0 | Company Invited | 3 | NaN | Salaried | Female | 2 | 5.0 | Basic | 3.0 | Single | 2.0 | 0 | 3 | 0 | 1.0 | Executive | NaN |
| 2393 | 1 | 34.0 | Self Enquiry | 3 | NaN | Small Business | Male | 2 | 3.0 | Standard | 3.0 | Married | 5.0 | 0 | 3 | 1 | 1.0 | Senior Manager | 19490.0 |
| 2401 | 1 | 30.0 | Company Invited | 1 | NaN | Small Business | Male | 2 | 3.0 | Basic | 5.0 | Single | 3.0 | 1 | 3 | 0 | 1.0 | Executive | 19695.0 |
| 2409 | 1 | 32.0 | Self Enquiry | 1 | NaN | Small Business | Male | 2 | 3.0 | Deluxe | 4.0 | Married | 5.0 | 0 | 4 | 1 | 1.0 | Manager | 19883.0 |
| 2411 | 1 | 30.0 | Company Invited | 3 | NaN | Small Business | Female | 3 | 3.0 | Basic | 3.0 | Single | 2.0 | 1 | 5 | 1 | 1.0 | Executive | 19627.0 |
| 2419 | 1 | 39.0 | Self Enquiry | 3 | NaN | Salaried | Male | 3 | 3.0 | Basic | 3.0 | Married | 2.0 | 1 | 1 | 1 | 2.0 | Executive | 19534.0 |
| 2429 | 1 | 40.0 | Self Enquiry | 3 | NaN | Small Business | Male | 2 | 3.0 | Basic | 4.0 | Single | 2.0 | 0 | 5 | 0 | 0.0 | Executive | 19661.0 |
| 2431 | 1 | 35.0 | Company Invited | 1 | NaN | Small Business | Male | 3 | 3.0 | Basic | 4.0 | Married | 2.0 | 1 | 3 | 1 | 0.0 | Executive | NaN |
| 2438 | 1 | 36.0 | Self Enquiry | 2 | NaN | Salaried | Male | 2 | 4.0 | Basic | 5.0 | Married | 5.0 | 1 | 5 | 1 | 1.0 | Executive | 19639.0 |
| 2621 | 1 | 20.0 | Self Enquiry | 1 | NaN | Salaried | Male | 3 | 5.0 | Basic | 3.0 | Single | 3.0 | 0 | 5 | 0 | 1.0 | Executive | 19780.0 |
| 2745 | 0 | 19.0 | Self Enquiry | 3 | NaN | Small Business | Female | 4 | 5.0 | Basic | 3.0 | Single | 3.0 | 0 | 2 | 1 | 3.0 | Executive | 19878.0 |
| 2957 | 1 | 21.0 | Self Enquiry | 1 | NaN | Small Business | Male | 3 | 4.0 | Basic | 3.0 | Single | 3.0 | 1 | 5 | 0 | 1.0 | Executive | 19687.0 |
| 3171 | 0 | 19.0 | Company Invited | 1 | NaN | Salaried | Male | 4 | 4.0 | Basic | 3.0 | Single | 3.0 | 0 | 2 | 1 | 2.0 | Executive | 19729.0 |
| 3208 | 0 | 29.0 | Self Enquiry | 3 | NaN | Small Business | Male | 4 | 4.0 | Basic | 4.0 | Divorced | 3.0 | 0 | 5 | 1 | 2.0 | Executive | 19730.0 |
| 3355 | 1 | 26.0 | Company Invited | 3 | NaN | Salaried | Male | 4 | 6.0 | Basic | 3.0 | Single | 3.0 | 1 | 1 | 1 | 2.0 | Executive | 19796.0 |
| 3782 | 1 | 31.0 | Self Enquiry | 3 | NaN | Small Business | Male | 3 | 4.0 | Basic | 3.0 | Single | 3.0 | 0 | 5 | 0 | 1.0 | Executive | 19759.0 |
| 3809 | 1 | 30.0 | Company Invited | 3 | NaN | Large Business | Male | 3 | 2.0 | Basic | 5.0 | Single | 3.0 | 0 | 3 | 1 | 2.0 | Executive | 19769.0 |
| 3846 | 0 | 32.0 | Self Enquiry | 1 | NaN | Small Business | Female | 3 | 6.0 | Basic | 3.0 | Married | 3.0 | 1 | 5 | 1 | 1.0 | Executive | 19807.0 |
| 4091 | 1 | 20.0 | Self Enquiry | 1 | NaN | Salaried | Male | 3 | 5.0 | Basic | 3.0 | Single | 3.0 | 0 | 5 | 1 | 1.0 | Executive | 19780.0 |
| 4215 | 0 | 19.0 | Self Enquiry | 3 | NaN | Small Business | Female | 4 | 5.0 | Basic | 3.0 | Single | 3.0 | 0 | 1 | 0 | 3.0 | Executive | 19878.0 |
| 4427 | 1 | 21.0 | Self Enquiry | 1 | NaN | Small Business | Male | 3 | 4.0 | Basic | 3.0 | Single | 3.0 | 1 | 5 | 1 | 2.0 | Executive | 19687.0 |
| 4641 | 0 | 19.0 | Company Invited | 1 | NaN | Salaried | Male | 4 | 4.0 | Basic | 3.0 | Single | 3.0 | 0 | 1 | 0 | 2.0 | Executive | 19729.0 |
| 4678 | 0 | 29.0 | Self Enquiry | 3 | NaN | Small Business | Male | 4 | 4.0 | Basic | 4.0 | Married | 3.0 | 0 | 5 | 0 | 3.0 | Executive | 19730.0 |
| 4825 | 1 | 26.0 | Self Enquiry | 1 | NaN | Salaried | Male | 3 | 4.0 | Basic | 5.0 | Married | 6.0 | 0 | 5 | 1 | 1.0 | Executive | 19796.0 |
# We'll impute these missing values one by one, by taking mean of Duration Of Pitch for the particular Product Pitched and Occupation
df.groupby(["Occupation", "ProductPitched"], as_index=False)[
"DurationOfPitch"
].mean().round(0)
| Occupation | ProductPitched | DurationOfPitch | |
|---|---|---|---|
| 0 | Free Lancer | Basic | 8.0 |
| 1 | Free Lancer | Deluxe | NaN |
| 2 | Free Lancer | King | NaN |
| 3 | Free Lancer | Standard | NaN |
| 4 | Free Lancer | Super Deluxe | NaN |
| 5 | Large Business | Basic | 14.0 |
| 6 | Large Business | Deluxe | 16.0 |
| 7 | Large Business | King | 13.0 |
| 8 | Large Business | Standard | 15.0 |
| 9 | Large Business | Super Deluxe | 13.0 |
| 10 | Salaried | Basic | 15.0 |
| 11 | Salaried | Deluxe | 15.0 |
| 12 | Salaried | King | 13.0 |
| 13 | Salaried | Standard | 16.0 |
| 14 | Salaried | Super Deluxe | 17.0 |
| 15 | Small Business | Basic | 16.0 |
| 16 | Small Business | Deluxe | 17.0 |
| 17 | Small Business | King | 11.0 |
| 18 | Small Business | Standard | 16.0 |
| 19 | Small Business | Super Deluxe | 16.0 |
# Impute missing values of Duration of Pitch
df["DurationOfPitch"] = df.groupby(["Occupation", "ProductPitched"])[
"DurationOfPitch"
].transform(lambda x: round(x.fillna(x.mean())))
# Checking DurationOfPitch
df[df["DurationOfPitch"].isnull()]
| ProdTaken | Age | TypeofContact | CityTier | DurationOfPitch | Occupation | Gender | NumberOfPersonVisiting | NumberOfFollowups | ProductPitched | PreferredPropertyStar | MaritalStatus | NumberOfTrips | Passport | PitchSatisfactionScore | OwnCar | NumberOfChildrenVisiting | Designation | MonthlyIncome |
|---|
# Looking at a few rows where #MonthlyIncome is missing
df[df["MonthlyIncome"].isnull()]
| ProdTaken | Age | TypeofContact | CityTier | DurationOfPitch | Occupation | Gender | NumberOfPersonVisiting | NumberOfFollowups | ProductPitched | PreferredPropertyStar | MaritalStatus | NumberOfTrips | Passport | PitchSatisfactionScore | OwnCar | NumberOfChildrenVisiting | Designation | MonthlyIncome | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 11 | 0 | NaN | Self Enquiry | 1 | 21.0 | Salaried | Female | 2 | 4.0 | Deluxe | 3.0 | Single | 1.0 | 1 | 3 | 0 | 0.0 | Manager | NaN |
| 19 | 0 | NaN | Self Enquiry | 1 | 8.0 | Salaried | Male | 2 | 3.0 | Basic | 3.0 | Single | 6.0 | 1 | 4 | 0 | 1.0 | Executive | NaN |
| 20 | 0 | NaN | Company Invited | 1 | 17.0 | Salaried | Female | 3 | 2.0 | Deluxe | 3.0 | Married | 1.0 | 0 | 3 | 1 | 2.0 | Manager | NaN |
| 26 | 1 | NaN | Company Invited | 1 | 22.0 | Salaried | Female | 3 | 5.0 | Basic | 5.0 | Single | 2.0 | 1 | 4 | 1 | 2.0 | Executive | NaN |
| 44 | 0 | NaN | Company Invited | 1 | 6.0 | Small Business | Female | 2 | 3.0 | Deluxe | 3.0 | Single | 2.0 | 0 | 3 | 1 | 0.0 | Manager | NaN |
| 54 | 0 | NaN | Self Enquiry | 3 | 29.0 | Small Business | Female | 2 | 4.0 | Deluxe | 3.0 | Divorced | 1.0 | 1 | 2 | 1 | 0.0 | Manager | NaN |
| 57 | 0 | NaN | Self Enquiry | 1 | 29.0 | Small Business | Female | 1 | 3.0 | Basic | 5.0 | Divorced | 4.0 | 1 | 4 | 1 | 0.0 | Executive | NaN |
| 75 | 0 | 31.0 | Self Enquiry | 1 | 15.0 | Salaried | Female | 3 | 3.0 | Deluxe | 3.0 | Single | 2.0 | 0 | 5 | 0 | 1.0 | Manager | NaN |
| 76 | 0 | 35.0 | Self Enquiry | 3 | 17.0 | Small Business | Male | 2 | 4.0 | Deluxe | 5.0 | Single | 1.0 | 0 | 2 | 0 | 1.0 | Manager | NaN |
| 84 | 0 | 34.0 | Self Enquiry | 1 | 17.0 | Small Business | Male | 3 | 3.0 | Deluxe | 4.0 | Divorced | 2.0 | 0 | 5 | 0 | 0.0 | Manager | NaN |
| 88 | 0 | NaN | Self Enquiry | 1 | 8.0 | Salaried | Male | 2 | 4.0 | Deluxe | 3.0 | Divorced | 2.0 | 0 | 4 | 1 | 0.0 | Manager | NaN |
| 97 | 0 | NaN | Company Invited | 3 | 10.0 | Small Business | Male | 2 | 3.0 | Deluxe | 3.0 | Divorced | 2.0 | 0 | 2 | 1 | 0.0 | Manager | NaN |
| 140 | 1 | NaN | Self Enquiry | 1 | 15.0 | Small Business | Female | 2 | 3.0 | Basic | 5.0 | Single | 1.0 | 0 | 4 | 1 | 1.0 | Executive | NaN |
| 155 | 0 | 29.0 | Company Invited | 1 | 16.0 | Large Business | Male | 2 | 3.0 | Deluxe | 3.0 | Divorced | 2.0 | 0 | 3 | 0 | 1.0 | Manager | NaN |
| 175 | 0 | 56.0 | Self Enquiry | 1 | 15.0 | Salaried | Female | 3 | 3.0 | Basic | 5.0 | Married | 5.0 | 1 | 4 | 1 | 1.0 | Executive | NaN |
| 180 | 0 | NaN | Self Enquiry | 1 | 18.0 | Small Business | Female | 3 | 3.0 | Basic | 3.0 | Divorced | 1.0 | 1 | 4 | 1 | 2.0 | Executive | NaN |
| 184 | 0 | 53.0 | Self Enquiry | 1 | 17.0 | Small Business | Female | 2 | 2.0 | Deluxe | 5.0 | Married | 2.0 | 0 | 2 | 1 | 1.0 | Manager | NaN |
| 196 | 0 | 35.0 | Company Invited | 1 | 17.0 | Small Business | Female | 2 | 3.0 | Deluxe | 4.0 | Single | 6.0 | 0 | 2 | 1 | 0.0 | Manager | NaN |
| 200 | 0 | 27.0 | Company Invited | 1 | 16.0 | Large Business | Male | 2 | 4.0 | Deluxe | 5.0 | Divorced | 6.0 | 0 | 3 | 0 | 1.0 | Manager | NaN |
| 202 | 0 | NaN | Company Invited | 1 | 16.0 | Small Business | Male | 2 | 3.0 | Basic | 3.0 | Married | 2.0 | 1 | 3 | 1 | 0.0 | Executive | NaN |
| 224 | 0 | 31.0 | NaN | 1 | 17.0 | Small Business | Male | 2 | 5.0 | Deluxe | 3.0 | Divorced | 1.0 | 0 | 3 | 1 | 0.0 | Manager | NaN |
| 238 | 0 | NaN | Self Enquiry | 3 | 10.0 | Salaried | Female | 2 | 3.0 | Basic | 4.0 | Divorced | 3.0 | 0 | 3 | 1 | 0.0 | Executive | NaN |
| 239 | 1 | NaN | Self Enquiry | 1 | 6.0 | Salaried | Male | 3 | 4.0 | Basic | 3.0 | Single | 1.0 | 1 | 5 | 1 | 0.0 | Executive | NaN |
| 241 | 0 | 32.0 | Company Invited | 3 | 17.0 | Small Business | Male | 3 | 3.0 | Deluxe | 3.0 | Divorced | 1.0 | 0 | 4 | 1 | 2.0 | Manager | NaN |
| 248 | 0 | NaN | Self Enquiry | 1 | 6.0 | Small Business | Female | 2 | 4.0 | Basic | 5.0 | Divorced | 3.0 | 0 | 4 | 1 | 1.0 | Executive | NaN |
| 267 | 0 | NaN | Company Invited | 1 | 11.0 | Salaried | Male | 2 | 2.0 | Deluxe | 5.0 | Married | 2.0 | 0 | 5 | 1 | 1.0 | Manager | NaN |
| 320 | 0 | 27.0 | Self Enquiry | 3 | 15.0 | Salaried | Male | 3 | 3.0 | Deluxe | 3.0 | Single | 2.0 | 1 | 4 | 1 | 2.0 | Manager | NaN |
| 337 | 0 | NaN | Self Enquiry | 1 | 15.0 | Salaried | Male | 1 | 4.0 | Basic | 3.0 | Single | 1.0 | 0 | 2 | 1 | 0.0 | Executive | NaN |
| 373 | 0 | NaN | Self Enquiry | 1 | 6.0 | Small Business | Male | 2 | 3.0 | Basic | 4.0 | Married | 2.0 | 0 | 3 | 0 | 0.0 | Executive | NaN |
| 380 | 0 | 24.0 | Self Enquiry | 3 | 17.0 | Small Business | Female | 3 | 3.0 | Deluxe | 3.0 | Married | 2.0 | 1 | 2 | 0 | 2.0 | Manager | NaN |
| 389 | 0 | NaN | Self Enquiry | 1 | 16.0 | Salaried | Male | 2 | 3.0 | Deluxe | 4.0 | Single | 3.0 | 1 | 5 | 0 | 1.0 | Manager | NaN |
| 394 | 0 | 31.0 | Self Enquiry | 1 | 17.0 | Small Business | Female | 2 | 3.0 | Deluxe | 5.0 | Divorced | 2.0 | 0 | 2 | 1 | 0.0 | Manager | NaN |
| 398 | 0 | 37.0 | Company Invited | 1 | 17.0 | Small Business | Female | 3 | 3.0 | Deluxe | 3.0 | Divorced | 4.0 | 1 | 3 | 1 | 1.0 | Manager | NaN |
| 405 | 1 | NaN | Self Enquiry | 1 | 9.0 | Small Business | Male | 3 | 3.0 | Basic | 5.0 | Divorced | 6.0 | 0 | 5 | 1 | 1.0 | Executive | NaN |
| 430 | 0 | 35.0 | Self Enquiry | 1 | 28.0 | Salaried | Male | 2 | 5.0 | Basic | 3.0 | Single | 1.0 | 0 | 4 | 1 | 0.0 | Executive | NaN |
| 431 | 0 | NaN | Self Enquiry | 1 | 14.0 | Salaried | Female | 2 | 3.0 | Deluxe | 3.0 | Single | 1.0 | 0 | 2 | 1 | 0.0 | Manager | NaN |
| 443 | 1 | NaN | Company Invited | 1 | 10.0 | Large Business | Male | 2 | 4.0 | Basic | 3.0 | Single | 6.0 | 0 | 5 | 1 | 0.0 | Executive | NaN |
| 444 | 0 | NaN | Self Enquiry | 3 | 8.0 | Small Business | Female | 2 | 3.0 | Deluxe | 3.0 | Divorced | 3.0 | 0 | 5 | 1 | 0.0 | Manager | NaN |
| 449 | 0 | NaN | Company Invited | 1 | 14.0 | Salaried | Female | 2 | 3.0 | Basic | 3.0 | Divorced | 2.0 | 0 | 2 | 0 | 0.0 | Executive | NaN |
| 454 | 0 | 29.0 | Self Enquiry | 1 | 16.0 | Large Business | Male | 2 | 3.0 | Deluxe | 3.0 | Married | 5.0 | 0 | 5 | 0 | 0.0 | Manager | NaN |
| 460 | 0 | 26.0 | Self Enquiry | 3 | 17.0 | Small Business | Male | 2 | 3.0 | Deluxe | 3.0 | Married | 6.0 | 0 | 5 | 1 | 0.0 | Manager | NaN |
| 482 | 1 | 31.0 | Self Enquiry | 3 | 19.0 | Small Business | Male | 2 | 3.0 | Deluxe | 5.0 | Married | 3.0 | 0 | 2 | 0 | 0.0 | Manager | NaN |
| 488 | 0 | NaN | Self Enquiry | 1 | 8.0 | Salaried | Female | 2 | 3.0 | Deluxe | 3.0 | Divorced | 2.0 | 0 | 2 | 1 | 0.0 | Manager | NaN |
| 504 | 1 | 45.0 | Company Invited | 3 | 15.0 | Salaried | Female | 3 | 3.0 | Deluxe | 5.0 | Divorced | 3.0 | 0 | 2 | 0 | 0.0 | Manager | NaN |
| 518 | 0 | NaN | Self Enquiry | 3 | 13.0 | Small Business | Female | 2 | 4.0 | Deluxe | 3.0 | Single | 1.0 | 0 | 2 | 1 | 1.0 | Manager | NaN |
| 539 | 0 | NaN | Self Enquiry | 3 | 14.0 | Small Business | Male | 2 | 3.0 | Basic | 4.0 | Divorced | 1.0 | 0 | 2 | 1 | 0.0 | Executive | NaN |
| 570 | 0 | 52.0 | Company Invited | 1 | 16.0 | Small Business | Male | 2 | 3.0 | Basic | 3.0 | Divorced | 1.0 | 0 | 3 | 1 | 0.0 | Executive | NaN |
| 571 | 0 | 26.0 | NaN | 1 | 15.0 | Salaried | Female | 3 | 5.0 | Basic | 3.0 | Married | 4.0 | 0 | 4 | 1 | 2.0 | Executive | NaN |
| 572 | 0 | 29.0 | NaN | 1 | 17.0 | Small Business | Female | 3 | 3.0 | Deluxe | 3.0 | Divorced | 5.0 | 0 | 2 | 1 | 0.0 | Manager | NaN |
| 576 | 0 | 27.0 | NaN | 3 | 17.0 | Small Business | Male | 2 | 3.0 | Deluxe | 3.0 | Divorced | 1.0 | 0 | 3 | 0 | 1.0 | Manager | NaN |
| 579 | 0 | 34.0 | NaN | 1 | 16.0 | Small Business | Female | 2 | 4.0 | Basic | 5.0 | Single | 2.0 | 0 | 2 | 1 | 1.0 | Executive | NaN |
| 581 | 0 | NaN | Self Enquiry | 1 | 6.0 | Salaried | Male | 2 | 5.0 | Basic | 3.0 | Divorced | 4.0 | 0 | 4 | 1 | 0.0 | Executive | NaN |
| 582 | 0 | 40.0 | Company Invited | 1 | 17.0 | Small Business | Female | 2 | 1.0 | Deluxe | 4.0 | Divorced | 2.0 | 0 | 3 | 0 | 1.0 | Manager | NaN |
| 598 | 1 | 28.0 | NaN | 1 | 16.0 | Small Business | Male | 2 | 3.0 | Basic | 3.0 | Single | 7.0 | 0 | 3 | 0 | 0.0 | Executive | NaN |
| 604 | 0 | 42.0 | Self Enquiry | 1 | 15.0 | Salaried | Male | 3 | 3.0 | Deluxe | 4.0 | Divorced | 2.0 | 0 | 3 | 1 | 1.0 | Manager | NaN |
| 613 | 0 | NaN | Self Enquiry | 2 | 9.0 | Salaried | Male | 2 | 3.0 | Basic | 3.0 | Divorced | 1.0 | 0 | 2 | 1 | 1.0 | Executive | NaN |
| 619 | 0 | NaN | Self Enquiry | 3 | 6.0 | Small Business | Male | 2 | 1.0 | Deluxe | 5.0 | Married | 2.0 | 0 | 4 | 0 | 0.0 | Manager | NaN |
| 622 | 0 | 32.0 | NaN | 3 | 15.0 | Salaried | Male | 3 | 3.0 | Deluxe | 3.0 | Married | 3.0 | 0 | 2 | 0 | 0.0 | Manager | NaN |
| 623 | 0 | NaN | Company Invited | 1 | 11.0 | Salaried | Male | 3 | 4.0 | Basic | 3.0 | Married | 1.0 | 0 | 5 | 0 | 0.0 | Executive | NaN |
| 634 | 0 | NaN | Self Enquiry | 3 | 9.0 | Salaried | Male | 3 | 3.0 | Deluxe | 5.0 | Divorced | 2.0 | 1 | 4 | 1 | 0.0 | Manager | NaN |
| 638 | 0 | 25.0 | Self Enquiry | 3 | 17.0 | Small Business | Male | 2 | 4.0 | Deluxe | 5.0 | Divorced | 2.0 | 0 | 3 | 1 | 1.0 | Manager | NaN |
| 666 | 1 | NaN | Self Enquiry | 1 | 9.0 | Salaried | Female | 2 | 3.0 | Deluxe | 3.0 | Divorced | 1.0 | 1 | 5 | 1 | 1.0 | Manager | NaN |
| 676 | 0 | NaN | Self Enquiry | 1 | 27.0 | Salaried | Female | 2 | 3.0 | Deluxe | 3.0 | Divorced | 2.0 | 1 | 3 | 0 | 1.0 | Manager | NaN |
| 685 | 0 | 26.0 | Company Invited | 3 | 17.0 | Small Business | Male | 2 | 4.0 | Deluxe | 5.0 | Single | 2.0 | 0 | 3 | 1 | 1.0 | Manager | NaN |
| 696 | 0 | 45.0 | Company Invited | 1 | 15.0 | Salaried | Male | 2 | 3.0 | Deluxe | 4.0 | Divorced | 2.0 | 0 | 4 | 1 | 1.0 | Manager | NaN |
| 719 | 0 | NaN | Self Enquiry | 3 | 10.0 | Salaried | Female | 2 | 3.0 | Deluxe | 3.0 | Single | 2.0 | 1 | 3 | 1 | 1.0 | Manager | NaN |
| 724 | 0 | 24.0 | NaN | 1 | 17.0 | Small Business | Female | 2 | 4.0 | Deluxe | 3.0 | Married | 2.0 | 0 | 3 | 1 | 1.0 | Manager | NaN |
| 725 | 1 | NaN | Self Enquiry | 1 | 20.0 | Salaried | Male | 2 | 4.0 | Basic | 4.0 | Married | 2.0 | 1 | 5 | 1 | 1.0 | Executive | NaN |
| 726 | 0 | NaN | Company Invited | 1 | 6.0 | Salaried | Female | 3 | 3.0 | Deluxe | 5.0 | Divorced | 2.0 | 0 | 5 | 1 | 2.0 | Manager | NaN |
| 735 | 0 | 48.0 | Self Enquiry | 1 | 15.0 | Salaried | Male | 3 | 4.0 | Deluxe | 3.0 | Single | 3.0 | 0 | 2 | 0 | 0.0 | Manager | NaN |
| 739 | 0 | 27.0 | Self Enquiry | 1 | 8.0 | Salaried | Female | 2 | 3.0 | Deluxe | 3.0 | Divorced | 1.0 | 0 | 5 | 0 | 1.0 | Manager | NaN |
| 740 | 0 | NaN | Self Enquiry | 1 | 16.0 | Salaried | Male | 2 | 3.0 | Basic | 3.0 | Married | 2.0 | 0 | 5 | 1 | 0.0 | Executive | NaN |
| 756 | 0 | NaN | Company Invited | 1 | 35.0 | Small Business | Female | 3 | 3.0 | Basic | 3.0 | Single | 1.0 | 0 | 3 | 1 | 0.0 | Executive | NaN |
| 767 | 0 | NaN | Self Enquiry | 1 | 9.0 | Salaried | Female | 2 | 3.0 | Deluxe | 4.0 | Single | 4.0 | 0 | 4 | 1 | 0.0 | Manager | NaN |
| 781 | 0 | NaN | Self Enquiry | 1 | 6.0 | Small Business | Male | 2 | 4.0 | Basic | 5.0 | Divorced | 2.0 | 0 | 3 | 0 | 1.0 | Executive | NaN |
| 802 | 0 | 33.0 | Company Invited | 3 | 15.0 | Salaried | Female | 1 | 3.0 | Deluxe | 4.0 | Divorced | 1.0 | 0 | 4 | 1 | 0.0 | Manager | NaN |
| 822 | 0 | NaN | Company Invited | 1 | 8.0 | Salaried | Male | 3 | 3.0 | Deluxe | 3.0 | Single | 3.0 | 0 | 4 | 1 | 2.0 | Manager | NaN |
| 824 | 0 | 42.0 | Self Enquiry | 1 | 17.0 | Small Business | Male | 2 | 5.0 | Deluxe | 3.0 | Single | 5.0 | 0 | 3 | 1 | 0.0 | Manager | NaN |
| 835 | 0 | 35.0 | Self Enquiry | 2 | 14.0 | Large Business | Male | 3 | 3.0 | Basic | 3.0 | Single | 2.0 | 0 | 5 | 0 | 1.0 | Executive | NaN |
| 843 | 0 | 26.0 | NaN | 1 | 16.0 | Small Business | Male | 2 | 1.0 | Basic | 3.0 | Divorced | 2.0 | 0 | 5 | 1 | 1.0 | Executive | NaN |
| 845 | 0 | 40.0 | Company Invited | 1 | 17.0 | Small Business | Female | 3 | 4.0 | Deluxe | 3.0 | Divorced | 4.0 | 1 | 2 | 1 | 0.0 | Manager | NaN |
| 854 | 0 | 45.0 | Self Enquiry | 1 | 16.0 | Small Business | Female | 2 | 3.0 | Basic | 3.0 | Divorced | 5.0 | 1 | 3 | 1 | 0.0 | Executive | NaN |
| 865 | 0 | NaN | Self Enquiry | 3 | 35.0 | Salaried | Male | 3 | 3.0 | Deluxe | 5.0 | Married | 1.0 | 0 | 2 | 1 | 1.0 | Manager | NaN |
| 866 | 0 | 40.0 | Company Invited | 3 | 17.0 | Small Business | Male | 3 | 3.0 | Deluxe | 4.0 | Divorced | 6.0 | 0 | 4 | 1 | 2.0 | Manager | NaN |
| 893 | 0 | NaN | Self Enquiry | 1 | 6.0 | Salaried | Female | 3 | 3.0 | Basic | 3.0 | Married | 2.0 | 0 | 2 | 0 | 2.0 | Executive | NaN |
| 920 | 0 | 34.0 | Company Invited | 1 | 17.0 | Small Business | Female | 2 | 3.0 | Deluxe | 4.0 | Married | 5.0 | 0 | 3 | 1 | 0.0 | Manager | NaN |
| 929 | 0 | NaN | Company Invited | 1 | 8.0 | Salaried | Male | 2 | 4.0 | Basic | 3.0 | Divorced | 2.0 | 1 | 4 | 1 | 0.0 | Executive | NaN |
| 940 | 1 | NaN | Self Enquiry | 1 | 29.0 | Small Business | Male | 3 | 3.0 | Basic | 5.0 | Single | 1.0 | 0 | 3 | 0 | 2.0 | Executive | NaN |
| 960 | 0 | NaN | Company Invited | 3 | 6.0 | Small Business | Female | 3 | 3.0 | Deluxe | 5.0 | Married | 1.0 | 0 | 3 | 0 | 1.0 | Manager | NaN |
| 961 | 0 | 35.0 | Company Invited | 1 | 15.0 | Salaried | Male | 2 | 3.0 | Deluxe | 3.0 | Single | 2.0 | 1 | 1 | 1 | 1.0 | Manager | NaN |
| 991 | 0 | NaN | Self Enquiry | 3 | 8.0 | Small Business | Male | 2 | 3.0 | Deluxe | 4.0 | Divorced | 1.0 | 1 | 1 | 0 | 1.0 | Manager | NaN |
| 995 | 0 | NaN | Self Enquiry | 1 | 12.0 | Small Business | Female | 3 | 4.0 | Deluxe | 3.0 | Single | 2.0 | 1 | 4 | 1 | 1.0 | Manager | NaN |
| 998 | 0 | NaN | Self Enquiry | 1 | 8.0 | Small Business | Male | 2 | 4.0 | Basic | 3.0 | Single | 1.0 | 0 | 3 | 0 | 0.0 | Executive | NaN |
| 1006 | 1 | 49.0 | Company Invited | 1 | 15.0 | Salaried | Male | 3 | 4.0 | Deluxe | 5.0 | Single | 4.0 | 0 | 3 | 0 | 1.0 | Manager | NaN |
| 1021 | 1 | 25.0 | NaN | 3 | 15.0 | Salaried | Male | 3 | 4.0 | Basic | 5.0 | Divorced | 4.0 | 0 | 1 | 1 | 0.0 | Executive | NaN |
| 1025 | 0 | NaN | Self Enquiry | 3 | 10.0 | Small Business | Female | 2 | 3.0 | Deluxe | 3.0 | Divorced | 2.0 | 1 | 5 | 1 | 0.0 | Manager | NaN |
| 1029 | 0 | NaN | Company Invited | 1 | 15.0 | Salaried | Male | 2 | 3.0 | Deluxe | 3.0 | Married | 4.0 | 0 | 3 | 1 | 0.0 | Manager | NaN |
| 1036 | 1 | NaN | Company Invited | 1 | 8.0 | Salaried | Male | 3 | 3.0 | Basic | 3.0 | Divorced | 7.0 | 1 | 3 | 1 | 1.0 | Executive | NaN |
| 1047 | 0 | 33.0 | NaN | 3 | 17.0 | Small Business | Male | 2 | 3.0 | Deluxe | 5.0 | Divorced | 1.0 | 0 | 3 | 0 | 0.0 | Manager | NaN |
| 1065 | 0 | NaN | Self Enquiry | 1 | 10.0 | Salaried | Male | 1 | 3.0 | Deluxe | 3.0 | Divorced | 1.0 | 1 | 4 | 0 | 0.0 | Manager | NaN |
| 1066 | 0 | NaN | Self Enquiry | 1 | 10.0 | Small Business | Female | 2 | 4.0 | Basic | 4.0 | Divorced | 1.0 | 0 | 1 | 0 | 0.0 | Executive | NaN |
| 1085 | 1 | NaN | Company Invited | 1 | 9.0 | Salaried | Female | 2 | 3.0 | Basic | 3.0 | Single | 2.0 | 0 | 4 | 1 | 1.0 | Executive | NaN |
| 1089 | 0 | 37.0 | Self Enquiry | 1 | 17.0 | Small Business | Male | 2 | 3.0 | Deluxe | 3.0 | Married | 2.0 | 0 | 1 | 0 | 0.0 | Manager | NaN |
| 1091 | 0 | 33.0 | Self Enquiry | 1 | 15.0 | Salaried | Male | 2 | 4.0 | Deluxe | 4.0 | Single | 2.0 | 0 | 1 | 0 | 1.0 | Manager | NaN |
| 1098 | 0 | NaN | Company Invited | 1 | 14.0 | Salaried | Male | 3 | 3.0 | Deluxe | 3.0 | Married | 4.0 | 1 | 3 | 1 | 1.0 | Manager | NaN |
| 1113 | 0 | NaN | Company Invited | 1 | 6.0 | Large Business | Male | 3 | 3.0 | Deluxe | 5.0 | Married | 5.0 | 0 | 4 | 1 | 2.0 | Manager | NaN |
| 1120 | 0 | NaN | Self Enquiry | 3 | 22.0 | Salaried | Female | 2 | 3.0 | Deluxe | 4.0 | Single | 3.0 | 0 | 1 | 0 | 0.0 | Manager | NaN |
| 1143 | 0 | 45.0 | NaN | 3 | 17.0 | Small Business | Male | 2 | 4.0 | Deluxe | 5.0 | Married | 2.0 | 0 | 3 | 0 | 0.0 | Manager | NaN |
| 1149 | 0 | NaN | Self Enquiry | 1 | 25.0 | Salaried | Male | 3 | 4.0 | Basic | 5.0 | Married | 2.0 | 0 | 1 | 0 | 0.0 | Executive | NaN |
| 1157 | 0 | NaN | Company Invited | 1 | 14.0 | Salaried | Female | 2 | 3.0 | Deluxe | 3.0 | Married | 2.0 | 0 | 5 | 1 | 1.0 | Manager | NaN |
| 1163 | 0 | NaN | Self Enquiry | 1 | 16.0 | Small Business | Female | 3 | 3.0 | Basic | 4.0 | Married | 2.0 | 0 | 1 | 1 | 0.0 | Executive | NaN |
| 1168 | 0 | NaN | Company Invited | 1 | 8.0 | Large Business | Female | 2 | 3.0 | Basic | 3.0 | Single | 2.0 | 1 | 3 | 1 | 1.0 | Executive | NaN |
| 1182 | 0 | 36.0 | NaN | 1 | 17.0 | Small Business | Female | 2 | 4.0 | Deluxe | 3.0 | Married | 1.0 | 0 | 5 | 1 | 1.0 | Manager | NaN |
| 1188 | 0 | NaN | Self Enquiry | 3 | 11.0 | Small Business | Male | 2 | 4.0 | Deluxe | 4.0 | Married | 2.0 | 1 | 5 | 0 | 0.0 | Manager | NaN |
| 1201 | 1 | NaN | Self Enquiry | 1 | 14.0 | Small Business | Male | 3 | 4.0 | Basic | 3.0 | Single | 2.0 | 1 | 4 | 1 | 2.0 | Executive | NaN |
| 1208 | 0 | NaN | Self Enquiry | 1 | 11.0 | Small Business | Male | 2 | 4.0 | Deluxe | 3.0 | Married | 2.0 | 0 | 4 | 1 | 1.0 | Manager | NaN |
| 1217 | 0 | 24.0 | NaN | 1 | 16.0 | Small Business | Male | 3 | 1.0 | Basic | 3.0 | Married | 2.0 | 0 | 1 | 0 | 0.0 | Executive | NaN |
| 1230 | 0 | NaN | Self Enquiry | 1 | 35.0 | Small Business | Male | 3 | 3.0 | Basic | 5.0 | Married | 2.0 | 0 | 4 | 1 | 1.0 | Executive | NaN |
| 1267 | 0 | NaN | Company Invited | 3 | 16.0 | Salaried | Male | 2 | 3.0 | Deluxe | 3.0 | Married | 2.0 | 1 | 1 | 1 | 1.0 | Manager | NaN |
| 1276 | 0 | NaN | Self Enquiry | 3 | 15.0 | Small Business | Male | 2 | 4.0 | Deluxe | 4.0 | Married | 2.0 | 0 | 4 | 0 | 1.0 | Manager | NaN |
| 1280 | 0 | NaN | Self Enquiry | 2 | 14.0 | Salaried | Male | 2 | 3.0 | Deluxe | 4.0 | Married | 3.0 | 0 | 3 | 1 | 1.0 | Manager | NaN |
| 1291 | 1 | NaN | Self Enquiry | 1 | 16.0 | Small Business | Male | 2 | 3.0 | Deluxe | 5.0 | Single | 2.0 | 0 | 4 | 0 | 0.0 | Manager | NaN |
| 1292 | 0 | NaN | Company Invited | 3 | 26.0 | Salaried | Male | 2 | 3.0 | Deluxe | 3.0 | Married | 3.0 | 0 | 3 | 1 | 0.0 | Manager | NaN |
| 1304 | 0 | 40.0 | Self Enquiry | 1 | 15.0 | Salaried | Female | 2 | 3.0 | Deluxe | 5.0 | Married | 3.0 | 0 | 1 | 1 | 1.0 | Manager | NaN |
| 1318 | 0 | NaN | Company Invited | 1 | 26.0 | Small Business | Male | 2 | 3.0 | Basic | 3.0 | Married | 2.0 | 0 | 3 | 1 | 1.0 | Executive | NaN |
| 1325 | 0 | NaN | Self Enquiry | 1 | 14.0 | Salaried | Male | 2 | 3.0 | Basic | 3.0 | Single | 5.0 | 0 | 1 | 1 | 1.0 | Executive | NaN |
| 1335 | 0 | NaN | Self Enquiry | 1 | 25.0 | Salaried | Male | 2 | 3.0 | Deluxe | 3.0 | Married | 1.0 | 0 | 5 | 1 | 0.0 | Manager | NaN |
| 1341 | 0 | NaN | Self Enquiry | 1 | 26.0 | Salaried | Male | 2 | 3.0 | Deluxe | 3.0 | Married | 2.0 | 0 | 5 | 0 | 0.0 | Manager | NaN |
| 1344 | 0 | 37.0 | Self Enquiry | 1 | 17.0 | Small Business | Male | 3 | 3.0 | Deluxe | 5.0 | Married | 6.0 | 1 | 3 | 0 | 0.0 | Manager | NaN |
| 1347 | 0 | NaN | Company Invited | 2 | 8.0 | Salaried | Male | 3 | 4.0 | Deluxe | 3.0 | Single | 2.0 | 0 | 3 | 1 | 0.0 | Manager | NaN |
| 1356 | 0 | 41.0 | NaN | 3 | 17.0 | Small Business | Female | 2 | 3.0 | Deluxe | 4.0 | Married | 6.0 | 0 | 3 | 1 | 1.0 | Manager | NaN |
| 1360 | 0 | NaN | Self Enquiry | 1 | 10.0 | Small Business | Female | 3 | 1.0 | Basic | 3.0 | Married | 1.0 | 0 | 5 | 1 | 2.0 | Executive | NaN |
| 1393 | 0 | NaN | Self Enquiry | 3 | 15.0 | Small Business | Male | 2 | 3.0 | Deluxe | 3.0 | Single | 2.0 | 0 | 1 | 0 | 1.0 | Manager | NaN |
| 1404 | 0 | 42.0 | Company Invited | 1 | 15.0 | Salaried | Male | 2 | 4.0 | Deluxe | 3.0 | Single | 2.0 | 0 | 3 | 1 | 1.0 | Manager | NaN |
| 1406 | 0 | 54.0 | Self Enquiry | 1 | 17.0 | Small Business | Female | 3 | 3.0 | Deluxe | 5.0 | Single | 7.0 | 1 | 1 | 0 | 0.0 | Manager | NaN |
| 1412 | 0 | NaN | Self Enquiry | 1 | 6.0 | Small Business | Male | 3 | 3.0 | Basic | 4.0 | Married | 2.0 | 0 | 5 | 0 | 1.0 | Executive | NaN |
| 1413 | 0 | NaN | Self Enquiry | 1 | 8.0 | Salaried | Male | 2 | 3.0 | Basic | 3.0 | Married | 7.0 | 1 | 5 | 0 | 0.0 | Executive | NaN |
| 1416 | 0 | 38.0 | Self Enquiry | 3 | 15.0 | Salaried | Male | 2 | 3.0 | Deluxe | 4.0 | Married | 1.0 | 0 | 4 | 1 | 1.0 | Manager | NaN |
| 1429 | 0 | NaN | Self Enquiry | 1 | 30.0 | Salaried | Male | 2 | 3.0 | Deluxe | 3.0 | Single | 1.0 | 0 | 4 | 1 | 0.0 | Manager | NaN |
| 1459 | 0 | NaN | Self Enquiry | 1 | 19.0 | Salaried | Male | 2 | 4.0 | Deluxe | 4.0 | Married | 5.0 | 1 | 4 | 1 | 0.0 | Manager | NaN |
| 1460 | 0 | NaN | Self Enquiry | 1 | 34.0 | Small Business | Female | 3 | 4.0 | Basic | 5.0 | Single | 2.0 | 0 | 1 | 0 | 1.0 | Executive | NaN |
| 1469 | 0 | 34.0 | NaN | 1 | 17.0 | Small Business | Male | 2 | 1.0 | Deluxe | 3.0 | Married | 3.0 | 0 | 3 | 0 | 1.0 | Manager | NaN |
| 1481 | 0 | NaN | Self Enquiry | 1 | 21.0 | Salaried | Female | 2 | 4.0 | Deluxe | 3.0 | Single | 1.0 | 1 | 1 | 1 | 1.0 | Manager | NaN |
| 1489 | 0 | NaN | Self Enquiry | 1 | 8.0 | Salaried | Male | 2 | 3.0 | Basic | 3.0 | Single | 6.0 | 1 | 1 | 0 | 0.0 | Executive | NaN |
| 1490 | 0 | NaN | Company Invited | 1 | 17.0 | Salaried | Female | 3 | 1.0 | Deluxe | 3.0 | Married | 1.0 | 0 | 5 | 0 | 1.0 | Manager | NaN |
| 1496 | 1 | NaN | Company Invited | 1 | 22.0 | Salaried | Female | 3 | 5.0 | Basic | 5.0 | Single | 2.0 | 1 | 4 | 0 | 0.0 | Executive | NaN |
| 1514 | 0 | NaN | Company Invited | 1 | 6.0 | Small Business | Female | 2 | 3.0 | Deluxe | 3.0 | Single | 2.0 | 0 | 3 | 1 | 0.0 | Manager | NaN |
| 1524 | 0 | NaN | Self Enquiry | 3 | 29.0 | Small Business | Female | 2 | 4.0 | Deluxe | 3.0 | Married | 1.0 | 1 | 1 | 0 | 0.0 | Manager | NaN |
| 1527 | 0 | NaN | Self Enquiry | 1 | 29.0 | Small Business | Female | 1 | 3.0 | Basic | 5.0 | Married | 4.0 | 1 | 4 | 0 | 0.0 | Executive | NaN |
| 1545 | 0 | 31.0 | Self Enquiry | 1 | 15.0 | Salaried | Female | 3 | 3.0 | Deluxe | 3.0 | Single | 2.0 | 0 | 5 | 1 | 2.0 | Manager | NaN |
| 1546 | 0 | 35.0 | Self Enquiry | 3 | 17.0 | Small Business | Male | 2 | 4.0 | Deluxe | 5.0 | Single | 1.0 | 0 | 1 | 1 | 1.0 | Manager | NaN |
| 1554 | 0 | 34.0 | Self Enquiry | 1 | 17.0 | Small Business | Male | 3 | 3.0 | Deluxe | 4.0 | Married | 2.0 | 0 | 5 | 1 | 2.0 | Manager | NaN |
| 1558 | 0 | NaN | Self Enquiry | 1 | 8.0 | Salaried | Male | 4 | 4.0 | Deluxe | 3.0 | Married | 2.0 | 0 | 4 | 1 | 1.0 | Manager | NaN |
| 1567 | 0 | 28.0 | Company Invited | 3 | 10.0 | Small Business | Male | 2 | 3.0 | Deluxe | 3.0 | Married | 2.0 | 0 | 1 | 0 | 0.0 | Manager | NaN |
| 1610 | 1 | NaN | Self Enquiry | 1 | 15.0 | Small Business | Female | 2 | 3.0 | Basic | 5.0 | Single | 1.0 | 0 | 4 | 1 | 0.0 | Executive | NaN |
| 1612 | 0 | 38.0 | Self Enquiry | 1 | 16.0 | Large Business | Female | 2 | 3.0 | Deluxe | 3.0 | Single | 4.0 | 1 | 5 | 1 | 0.0 | Manager | NaN |
| 1625 | 0 | 29.0 | Company Invited | 1 | 16.0 | Large Business | Male | 2 | 3.0 | Deluxe | 3.0 | Married | 2.0 | 0 | 3 | 0 | 0.0 | Manager | NaN |
| 1645 | 0 | 56.0 | Self Enquiry | 1 | 15.0 | Salaried | Female | 3 | 3.0 | Basic | 5.0 | Married | 5.0 | 1 | 4 | 1 | 1.0 | Executive | NaN |
| 1650 | 0 | NaN | Self Enquiry | 1 | 18.0 | Small Business | Female | 3 | 3.0 | Basic | 3.0 | Married | 1.0 | 1 | 4 | 1 | 1.0 | Executive | NaN |
| 1654 | 0 | 53.0 | Self Enquiry | 1 | 17.0 | Small Business | Female | 2 | 1.0 | Deluxe | 5.0 | Married | 2.0 | 0 | 1 | 1 | 0.0 | Manager | NaN |
| 1666 | 0 | 35.0 | Company Invited | 1 | 17.0 | Small Business | Female | 2 | 3.0 | Deluxe | 4.0 | Single | 6.0 | 0 | 1 | 0 | 0.0 | Manager | NaN |
| 1670 | 0 | 27.0 | Company Invited | 1 | 16.0 | Large Business | Male | 2 | 4.0 | Deluxe | 5.0 | Married | 6.0 | 0 | 3 | 1 | 0.0 | Manager | NaN |
| 1672 | 0 | NaN | Company Invited | 1 | 16.0 | Small Business | Male | 2 | 3.0 | Basic | 3.0 | Married | 2.0 | 1 | 3 | 1 | 1.0 | Executive | NaN |
| 1694 | 0 | 31.0 | NaN | 1 | 17.0 | Small Business | Male | 2 | 5.0 | Deluxe | 3.0 | Married | 1.0 | 0 | 3 | 0 | 0.0 | Manager | NaN |
| 1708 | 0 | NaN | Self Enquiry | 3 | 10.0 | Salaried | Female | 2 | 3.0 | Basic | 4.0 | Married | 3.0 | 0 | 3 | 1 | 1.0 | Executive | NaN |
| 1709 | 1 | NaN | Self Enquiry | 1 | 6.0 | Salaried | Male | 3 | 4.0 | Basic | 3.0 | Single | 1.0 | 1 | 5 | 0 | 2.0 | Executive | NaN |
| 1711 | 0 | 32.0 | Company Invited | 3 | 17.0 | Small Business | Male | 3 | 3.0 | Deluxe | 3.0 | Married | 1.0 | 0 | 4 | 1 | 1.0 | Manager | NaN |
| 1718 | 0 | NaN | Self Enquiry | 1 | 6.0 | Small Business | Female | 2 | 4.0 | Basic | 5.0 | Married | 3.0 | 0 | 4 | 0 | 1.0 | Executive | NaN |
| 1737 | 0 | NaN | Company Invited | 1 | 11.0 | Salaried | Male | 2 | 1.0 | Deluxe | 5.0 | Married | 2.0 | 0 | 5 | 1 | 1.0 | Manager | NaN |
| 1790 | 0 | 27.0 | Self Enquiry | 3 | 15.0 | Salaried | Male | 3 | 3.0 | Deluxe | 3.0 | Single | 2.0 | 1 | 4 | 0 | 1.0 | Manager | NaN |
| 1807 | 0 | NaN | Self Enquiry | 1 | 15.0 | Salaried | Male | 1 | 4.0 | Basic | 3.0 | Single | 1.0 | 0 | 1 | 1 | 0.0 | Executive | NaN |
| 1843 | 0 | NaN | Self Enquiry | 1 | 6.0 | Small Business | Male | 2 | 3.0 | Basic | 4.0 | Married | 2.0 | 0 | 3 | 0 | 0.0 | Executive | NaN |
| 1850 | 0 | 24.0 | Self Enquiry | 3 | 17.0 | Small Business | Female | 3 | 3.0 | Deluxe | 3.0 | Married | 2.0 | 1 | 1 | 0 | 0.0 | Manager | NaN |
| 1859 | 0 | NaN | Self Enquiry | 1 | 16.0 | Salaried | Male | 2 | 3.0 | Deluxe | 4.0 | Single | 3.0 | 1 | 5 | 0 | 1.0 | Manager | NaN |
| 1864 | 0 | 31.0 | Self Enquiry | 1 | 17.0 | Small Business | Female | 2 | 3.0 | Deluxe | 5.0 | Married | 2.0 | 0 | 1 | 1 | 0.0 | Manager | NaN |
| 1868 | 0 | 37.0 | Company Invited | 1 | 17.0 | Small Business | Female | 3 | 3.0 | Deluxe | 3.0 | Married | 4.0 | 1 | 3 | 1 | 2.0 | Manager | NaN |
| 1875 | 1 | NaN | Self Enquiry | 1 | 9.0 | Small Business | Male | 3 | 3.0 | Basic | 5.0 | Married | 6.0 | 0 | 5 | 1 | 1.0 | Executive | NaN |
| 1900 | 0 | 35.0 | Self Enquiry | 1 | 28.0 | Salaried | Male | 2 | 5.0 | Basic | 3.0 | Single | 1.0 | 0 | 4 | 1 | 1.0 | Executive | NaN |
| 1901 | 0 | NaN | Self Enquiry | 1 | 5.0 | Salaried | Female | 2 | 3.0 | Deluxe | 3.0 | Single | 1.0 | 0 | 1 | 0 | 0.0 | Manager | NaN |
| 1913 | 1 | NaN | Company Invited | 1 | 10.0 | Large Business | Male | 3 | 4.0 | Basic | 3.0 | Single | 6.0 | 0 | 5 | 0 | 0.0 | Executive | NaN |
| 1914 | 0 | NaN | Self Enquiry | 3 | 8.0 | Small Business | Female | 2 | 3.0 | Deluxe | 3.0 | Married | 3.0 | 0 | 5 | 1 | 0.0 | Manager | NaN |
| 1919 | 0 | NaN | Company Invited | 1 | 14.0 | Salaried | Female | 2 | 3.0 | Basic | 3.0 | Married | 2.0 | 0 | 1 | 1 | 1.0 | Executive | NaN |
| 1924 | 0 | 29.0 | Self Enquiry | 1 | 16.0 | Large Business | Male | 2 | 3.0 | Deluxe | 3.0 | Married | 5.0 | 0 | 5 | 1 | 0.0 | Manager | NaN |
| 1930 | 0 | 26.0 | Self Enquiry | 3 | 17.0 | Small Business | Male | 2 | 3.0 | Deluxe | 3.0 | Married | 6.0 | 0 | 5 | 0 | 0.0 | Manager | NaN |
| 1952 | 1 | 31.0 | Self Enquiry | 3 | 17.0 | Small Business | Male | 2 | 3.0 | Deluxe | 5.0 | Married | 3.0 | 0 | 1 | 0 | 1.0 | Manager | NaN |
| 1958 | 0 | NaN | Self Enquiry | 1 | 8.0 | Salaried | Female | 2 | 3.0 | Deluxe | 3.0 | Married | 2.0 | 0 | 1 | 0 | 0.0 | Manager | NaN |
| 1974 | 1 | 45.0 | Company Invited | 3 | 15.0 | Salaried | Female | 3 | 3.0 | Deluxe | 5.0 | Married | 3.0 | 0 | 1 | 0 | 2.0 | Manager | NaN |
| 1988 | 0 | NaN | Self Enquiry | 3 | 13.0 | Small Business | Female | 2 | 4.0 | Deluxe | 3.0 | Single | 1.0 | 0 | 1 | 1 | 0.0 | Manager | NaN |
| 2009 | 0 | NaN | Self Enquiry | 3 | 14.0 | Small Business | Male | 2 | 3.0 | Basic | 4.0 | Married | 1.0 | 0 | 1 | 0 | 1.0 | Executive | NaN |
| 2040 | 0 | 52.0 | Company Invited | 1 | 16.0 | Small Business | Male | 2 | 3.0 | Basic | 3.0 | Married | 1.0 | 0 | 3 | 1 | 0.0 | Executive | NaN |
| 2041 | 0 | 26.0 | NaN | 1 | 15.0 | Salaried | Female | 3 | 5.0 | Basic | 3.0 | Married | 4.0 | 0 | 4 | 1 | 0.0 | Executive | NaN |
| 2042 | 0 | 29.0 | NaN | 1 | 17.0 | Small Business | Female | 3 | 3.0 | Deluxe | 3.0 | Married | 5.0 | 0 | 1 | 0 | 1.0 | Manager | NaN |
| 2046 | 0 | 27.0 | NaN | 3 | 17.0 | Small Business | Male | 2 | 3.0 | Deluxe | 3.0 | Married | 1.0 | 0 | 3 | 1 | 1.0 | Manager | NaN |
| 2049 | 0 | 34.0 | NaN | 1 | 16.0 | Small Business | Female | 2 | 4.0 | Basic | 5.0 | Single | 2.0 | 0 | 1 | 1 | 0.0 | Executive | NaN |
| 2051 | 0 | NaN | Self Enquiry | 1 | 6.0 | Salaried | Male | 2 | 5.0 | Basic | 3.0 | Married | 4.0 | 0 | 4 | 1 | 0.0 | Executive | NaN |
| 2052 | 0 | 40.0 | Company Invited | 1 | 17.0 | Small Business | Female | 2 | 1.0 | Deluxe | 4.0 | Married | 2.0 | 0 | 3 | 0 | 0.0 | Manager | NaN |
| 2068 | 1 | 28.0 | NaN | 1 | 16.0 | Small Business | Male | 2 | 3.0 | Basic | 3.0 | Single | 7.0 | 0 | 3 | 1 | 1.0 | Executive | NaN |
| 2074 | 0 | 42.0 | Self Enquiry | 1 | 15.0 | Salaried | Male | 3 | 3.0 | Deluxe | 4.0 | Married | 2.0 | 0 | 3 | 1 | 0.0 | Manager | NaN |
| 2083 | 0 | NaN | Self Enquiry | 2 | 9.0 | Salaried | Male | 2 | 3.0 | Basic | 3.0 | Married | 1.0 | 0 | 1 | 1 | 0.0 | Executive | NaN |
| 2089 | 0 | NaN | Self Enquiry | 3 | 6.0 | Small Business | Male | 2 | 1.0 | Deluxe | 5.0 | Married | 2.0 | 0 | 4 | 0 | 1.0 | Manager | NaN |
| 2092 | 0 | 32.0 | NaN | 3 | 15.0 | Salaried | Male | 3 | 3.0 | Deluxe | 3.0 | Married | 3.0 | 0 | 1 | 0 | 2.0 | Manager | NaN |
| 2093 | 0 | NaN | Company Invited | 1 | 11.0 | Salaried | Male | 3 | 4.0 | Basic | 3.0 | Married | 1.0 | 0 | 5 | 0 | 0.0 | Executive | NaN |
| 2104 | 0 | NaN | Self Enquiry | 3 | 9.0 | Salaried | Male | 3 | 3.0 | Deluxe | 5.0 | Married | 2.0 | 1 | 4 | 0 | 2.0 | Manager | NaN |
| 2108 | 0 | 25.0 | Self Enquiry | 3 | 17.0 | Small Business | Male | 2 | 4.0 | Deluxe | 5.0 | Married | 2.0 | 0 | 3 | 1 | 1.0 | Manager | NaN |
| 2136 | 1 | NaN | Self Enquiry | 1 | 9.0 | Salaried | Female | 2 | 3.0 | Deluxe | 3.0 | Married | 1.0 | 1 | 5 | 1 | 1.0 | Manager | NaN |
| 2146 | 0 | NaN | Self Enquiry | 1 | 27.0 | Salaried | Female | 2 | 3.0 | Deluxe | 3.0 | Married | 2.0 | 1 | 3 | 0 | 0.0 | Manager | NaN |
| 2155 | 0 | 26.0 | Company Invited | 3 | 17.0 | Small Business | Male | 2 | 4.0 | Deluxe | 5.0 | Single | 2.0 | 0 | 3 | 0 | 0.0 | Manager | NaN |
| 2166 | 0 | 45.0 | Company Invited | 1 | 15.0 | Salaried | Male | 2 | 3.0 | Deluxe | 4.0 | Married | 2.0 | 0 | 4 | 1 | 0.0 | Manager | NaN |
| 2189 | 0 | NaN | Self Enquiry | 3 | 10.0 | Salaried | Female | 2 | 3.0 | Deluxe | 3.0 | Single | 2.0 | 1 | 3 | 1 | 1.0 | Manager | NaN |
| 2194 | 0 | 24.0 | NaN | 1 | 17.0 | Small Business | Female | 2 | 4.0 | Deluxe | 3.0 | Married | 2.0 | 0 | 3 | 0 | 0.0 | Manager | NaN |
| 2195 | 1 | NaN | Self Enquiry | 1 | 20.0 | Salaried | Male | 2 | 4.0 | Basic | 4.0 | Married | 2.0 | 1 | 5 | 1 | 1.0 | Executive | NaN |
| 2196 | 0 | NaN | Company Invited | 1 | 6.0 | Salaried | Female | 3 | 3.0 | Deluxe | 5.0 | Married | 2.0 | 0 | 5 | 1 | 0.0 | Manager | NaN |
| 2205 | 0 | 48.0 | Self Enquiry | 1 | 15.0 | Salaried | Male | 3 | 4.0 | Deluxe | 3.0 | Single | 3.0 | 0 | 1 | 0 | 2.0 | Manager | NaN |
| 2209 | 0 | 27.0 | Self Enquiry | 1 | 8.0 | Salaried | Female | 2 | 3.0 | Deluxe | 3.0 | Married | 1.0 | 0 | 5 | 1 | 0.0 | Manager | NaN |
| 2210 | 0 | NaN | Self Enquiry | 1 | 16.0 | Salaried | Male | 2 | 3.0 | Basic | 3.0 | Married | 2.0 | 0 | 5 | 0 | 0.0 | Executive | NaN |
| 2226 | 0 | NaN | Company Invited | 1 | 35.0 | Small Business | Female | 3 | 3.0 | Basic | 3.0 | Single | 1.0 | 0 | 3 | 1 | 2.0 | Executive | NaN |
| 2237 | 0 | NaN | Self Enquiry | 1 | 9.0 | Salaried | Female | 2 | 3.0 | Deluxe | 4.0 | Single | 4.0 | 0 | 4 | 1 | 0.0 | Manager | NaN |
| 2251 | 0 | NaN | Self Enquiry | 1 | 6.0 | Small Business | Male | 2 | 4.0 | Basic | 5.0 | Married | 2.0 | 0 | 3 | 1 | 1.0 | Executive | NaN |
| 2272 | 0 | 33.0 | Company Invited | 3 | 15.0 | Salaried | Female | 1 | 3.0 | Deluxe | 4.0 | Married | 1.0 | 0 | 4 | 0 | 0.0 | Manager | NaN |
| 2292 | 0 | NaN | Company Invited | 1 | 8.0 | Salaried | Male | 3 | 3.0 | Deluxe | 3.0 | Single | 3.0 | 0 | 4 | 1 | 1.0 | Manager | NaN |
| 2294 | 0 | 42.0 | Self Enquiry | 1 | 17.0 | Small Business | Male | 2 | 5.0 | Deluxe | 3.0 | Single | 5.0 | 0 | 3 | 1 | 1.0 | Manager | NaN |
| 2305 | 0 | 35.0 | Self Enquiry | 2 | 14.0 | Large Business | Male | 3 | 3.0 | Basic | 3.0 | Single | 2.0 | 0 | 5 | 0 | 1.0 | Executive | NaN |
| 2313 | 0 | 26.0 | NaN | 1 | 16.0 | Small Business | Male | 2 | 1.0 | Basic | 3.0 | Married | 2.0 | 0 | 5 | 1 | 1.0 | Executive | NaN |
| 2315 | 0 | 40.0 | Company Invited | 1 | 17.0 | Small Business | Female | 3 | 4.0 | Deluxe | 3.0 | Married | 4.0 | 1 | 1 | 0 | 1.0 | Manager | NaN |
| 2324 | 0 | 45.0 | Self Enquiry | 1 | 16.0 | Small Business | Female | 2 | 3.0 | Basic | 3.0 | Married | 5.0 | 1 | 3 | 0 | 1.0 | Executive | NaN |
| 2335 | 0 | NaN | Self Enquiry | 3 | 35.0 | Salaried | Male | 3 | 3.0 | Deluxe | 5.0 | Married | 1.0 | 0 | 1 | 1 | 0.0 | Manager | NaN |
| 2336 | 0 | 40.0 | Company Invited | 3 | 17.0 | Small Business | Male | 3 | 3.0 | Deluxe | 4.0 | Married | 6.0 | 0 | 4 | 1 | 1.0 | Manager | NaN |
| 2363 | 0 | NaN | Self Enquiry | 1 | 7.0 | Salaried | Female | 3 | 3.0 | Basic | 3.0 | Married | 2.0 | 0 | 1 | 1 | 2.0 | Executive | NaN |
| 2390 | 1 | 34.0 | Company Invited | 3 | 15.0 | Salaried | Female | 2 | 5.0 | Basic | 3.0 | Single | 2.0 | 0 | 3 | 0 | 1.0 | Executive | NaN |
| 2399 | 1 | NaN | Company Invited | 3 | 19.0 | Large Business | Female | 2 | 3.0 | Deluxe | 4.0 | Single | 6.0 | 0 | 3 | 1 | 0.0 | Manager | NaN |
| 2410 | 1 | NaN | Self Enquiry | 1 | 30.0 | Small Business | Male | 2 | 3.0 | Basic | 4.0 | Married | 2.0 | 1 | 1 | 0 | 0.0 | Executive | NaN |
| 2430 | 1 | NaN | Self Enquiry | 1 | 14.0 | Small Business | Female | 3 | 3.0 | Basic | 5.0 | Married | 2.0 | 1 | 3 | 0 | 2.0 | Executive | NaN |
| 2431 | 1 | 35.0 | Company Invited | 1 | 16.0 | Small Business | Male | 3 | 3.0 | Basic | 4.0 | Married | 2.0 | 1 | 3 | 1 | 0.0 | Executive | NaN |
df.MonthlyIncome.mean()
23592.533619763693
# We'll impute these missing values one by one, by taking mean of MonthlyIncome considering Gender, Occupation and Designation
df.groupby(["Gender", "Occupation", "Designation"], as_index=False)[
"MonthlyIncome"
].mean().round(0)
| Gender | Occupation | Designation | MonthlyIncome | |
|---|---|---|---|---|
| 0 | Female | Free Lancer | AVP | NaN |
| 1 | Female | Free Lancer | Executive | NaN |
| 2 | Female | Free Lancer | Manager | NaN |
| 3 | Female | Free Lancer | Senior Manager | NaN |
| 4 | Female | Free Lancer | VP | NaN |
| 5 | Female | Large Business | AVP | 31802.0 |
| 6 | Female | Large Business | Executive | 20146.0 |
| 7 | Female | Large Business | Manager | 22101.0 |
| 8 | Female | Large Business | Senior Manager | 27140.0 |
| 9 | Female | Large Business | VP | 36583.0 |
| 10 | Female | Salaried | AVP | 32358.0 |
| 11 | Female | Salaried | Executive | 19932.0 |
| 12 | Female | Salaried | Manager | 22476.0 |
| 13 | Female | Salaried | Senior Manager | 26895.0 |
| 14 | Female | Salaried | VP | 35911.0 |
| 15 | Female | Small Business | AVP | 32106.0 |
| 16 | Female | Small Business | Executive | 19859.0 |
| 17 | Female | Small Business | Manager | 22639.0 |
| 18 | Female | Small Business | Senior Manager | 26720.0 |
| 19 | Female | Small Business | VP | 35084.0 |
| 20 | Male | Free Lancer | AVP | NaN |
| 21 | Male | Free Lancer | Executive | 18929.0 |
| 22 | Male | Free Lancer | Manager | NaN |
| 23 | Male | Free Lancer | Senior Manager | NaN |
| 24 | Male | Free Lancer | VP | NaN |
| 25 | Male | Large Business | AVP | 29959.0 |
| 26 | Male | Large Business | Executive | 19894.0 |
| 27 | Male | Large Business | Manager | 22238.0 |
| 28 | Male | Large Business | Senior Manager | 26780.0 |
| 29 | Male | Large Business | VP | 36071.0 |
| 30 | Male | Salaried | AVP | 32554.0 |
| 31 | Male | Salaried | Executive | 19778.0 |
| 32 | Male | Salaried | Manager | 22889.0 |
| 33 | Male | Salaried | Senior Manager | 26292.0 |
| 34 | Male | Salaried | VP | 36089.0 |
| 35 | Male | Small Business | AVP | 32079.0 |
| 36 | Male | Small Business | Executive | 19829.0 |
| 37 | Male | Small Business | Manager | 22697.0 |
| 38 | Male | Small Business | Senior Manager | 26599.0 |
| 39 | Male | Small Business | VP | 35997.0 |
# Impute missing values of Monthly Income
df["MonthlyIncome"] = df.groupby(["Gender", "Occupation", "Designation"])[
"MonthlyIncome"
].transform(lambda x: round(x.fillna(x.mean())))
df[df["MonthlyIncome"].isnull()]
| ProdTaken | Age | TypeofContact | CityTier | DurationOfPitch | Occupation | Gender | NumberOfPersonVisiting | NumberOfFollowups | ProductPitched | PreferredPropertyStar | MaritalStatus | NumberOfTrips | Passport | PitchSatisfactionScore | OwnCar | NumberOfChildrenVisiting | Designation | MonthlyIncome |
|---|
# Looking rows where Age is null
df[df["Age"].isnull()]
| ProdTaken | Age | TypeofContact | CityTier | DurationOfPitch | Occupation | Gender | NumberOfPersonVisiting | NumberOfFollowups | ProductPitched | PreferredPropertyStar | MaritalStatus | NumberOfTrips | Passport | PitchSatisfactionScore | OwnCar | NumberOfChildrenVisiting | Designation | MonthlyIncome | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 4 | 0 | NaN | Self Enquiry | 1 | 8.0 | Small Business | Male | 2 | 3.0 | Basic | 4.0 | Divorced | 1.0 | 0 | 5 | 1 | 0.0 | Executive | 18468.0 |
| 11 | 0 | NaN | Self Enquiry | 1 | 21.0 | Salaried | Female | 2 | 4.0 | Deluxe | 3.0 | Single | 1.0 | 1 | 3 | 0 | 0.0 | Manager | 22476.0 |
| 19 | 0 | NaN | Self Enquiry | 1 | 8.0 | Salaried | Male | 2 | 3.0 | Basic | 3.0 | Single | 6.0 | 1 | 4 | 0 | 1.0 | Executive | 19778.0 |
| 20 | 0 | NaN | Company Invited | 1 | 17.0 | Salaried | Female | 3 | 2.0 | Deluxe | 3.0 | Married | 1.0 | 0 | 3 | 1 | 2.0 | Manager | 22476.0 |
| 21 | 1 | NaN | Self Enquiry | 3 | 15.0 | Salaried | Male | 2 | 4.0 | Deluxe | 5.0 | Single | 1.0 | 0 | 2 | 0 | 0.0 | Manager | 18407.0 |
| 26 | 1 | NaN | Company Invited | 1 | 22.0 | Salaried | Female | 3 | 5.0 | Basic | 5.0 | Single | 2.0 | 1 | 4 | 1 | 2.0 | Executive | 19932.0 |
| 44 | 0 | NaN | Company Invited | 1 | 6.0 | Small Business | Female | 2 | 3.0 | Deluxe | 3.0 | Single | 2.0 | 0 | 3 | 1 | 0.0 | Manager | 22639.0 |
| 51 | 1 | NaN | Self Enquiry | 1 | 11.0 | Large Business | Male | 2 | 3.0 | Basic | 3.0 | Single | 2.0 | 1 | 2 | 1 | 0.0 | Executive | 18441.0 |
| 54 | 0 | NaN | Self Enquiry | 3 | 29.0 | Small Business | Female | 2 | 4.0 | Deluxe | 3.0 | Divorced | 1.0 | 1 | 2 | 1 | 0.0 | Manager | 22639.0 |
| 57 | 0 | NaN | Self Enquiry | 1 | 29.0 | Small Business | Female | 1 | 3.0 | Basic | 5.0 | Divorced | 4.0 | 1 | 4 | 1 | 0.0 | Executive | 19859.0 |
| 69 | 1 | NaN | Self Enquiry | 1 | 15.0 | Small Business | Male | 3 | 4.0 | Basic | 3.0 | Divorced | 1.0 | 1 | 2 | 0 | 1.0 | Executive | 18388.0 |
| 88 | 0 | NaN | Self Enquiry | 1 | 8.0 | Salaried | Male | 2 | 4.0 | Deluxe | 3.0 | Divorced | 2.0 | 0 | 4 | 1 | 0.0 | Manager | 22889.0 |
| 97 | 0 | NaN | Company Invited | 3 | 10.0 | Small Business | Male | 2 | 3.0 | Deluxe | 3.0 | Divorced | 2.0 | 0 | 2 | 1 | 0.0 | Manager | 22697.0 |
| 140 | 1 | NaN | Self Enquiry | 1 | 15.0 | Small Business | Female | 2 | 3.0 | Basic | 5.0 | Single | 1.0 | 0 | 4 | 1 | 1.0 | Executive | 19859.0 |
| 141 | 0 | NaN | Self Enquiry | 1 | 35.0 | Small Business | Male | 2 | 3.0 | Basic | 3.0 | Single | 6.0 | 0 | 2 | 1 | 0.0 | Executive | 18452.0 |
| 180 | 0 | NaN | Self Enquiry | 1 | 18.0 | Small Business | Female | 3 | 3.0 | Basic | 3.0 | Divorced | 1.0 | 1 | 4 | 1 | 2.0 | Executive | 19859.0 |
| 183 | 0 | NaN | Self Enquiry | 1 | 6.0 | Small Business | Male | 2 | 4.0 | Basic | 3.0 | Divorced | 3.0 | 0 | 3 | 0 | 0.0 | Executive | 18690.0 |
| 195 | 0 | NaN | Self Enquiry | 1 | 27.0 | Salaried | Male | 3 | 2.0 | Basic | 5.0 | Divorced | 2.0 | 1 | 3 | 0 | 2.0 | Executive | 18564.0 |
| 202 | 0 | NaN | Company Invited | 1 | 16.0 | Small Business | Male | 2 | 3.0 | Basic | 3.0 | Married | 2.0 | 1 | 3 | 1 | 0.0 | Executive | 19829.0 |
| 238 | 0 | NaN | Self Enquiry | 3 | 10.0 | Salaried | Female | 2 | 3.0 | Basic | 4.0 | Divorced | 3.0 | 0 | 3 | 1 | 0.0 | Executive | 19932.0 |
| 239 | 1 | NaN | Self Enquiry | 1 | 6.0 | Salaried | Male | 3 | 4.0 | Basic | 3.0 | Single | 1.0 | 1 | 5 | 1 | 0.0 | Executive | 19778.0 |
| 248 | 0 | NaN | Self Enquiry | 1 | 6.0 | Small Business | Female | 2 | 4.0 | Basic | 5.0 | Divorced | 3.0 | 0 | 4 | 1 | 1.0 | Executive | 19859.0 |
| 259 | 1 | NaN | Company Invited | 1 | 35.0 | Small Business | Male | 3 | 4.0 | Basic | 4.0 | Single | 1.0 | 0 | 3 | 1 | 2.0 | Executive | 18479.0 |
| 264 | 1 | NaN | Self Enquiry | 1 | 8.0 | Salaried | Male | 3 | 3.0 | Basic | 3.0 | Single | 3.0 | 0 | 4 | 1 | 1.0 | Executive | 18485.0 |
| 267 | 0 | NaN | Company Invited | 1 | 11.0 | Salaried | Male | 2 | 2.0 | Deluxe | 5.0 | Married | 2.0 | 0 | 5 | 1 | 1.0 | Manager | 22889.0 |
| 298 | 0 | NaN | Company Invited | 1 | 24.0 | Salaried | Male | 2 | 2.0 | Basic | 3.0 | Divorced | 5.0 | 0 | 5 | 0 | 0.0 | Executive | 18688.0 |
| 323 | 1 | NaN | Self Enquiry | 1 | 8.0 | Small Business | Male | 2 | 5.0 | Basic | 3.0 | Divorced | 6.0 | 1 | 3 | 1 | 0.0 | Executive | 18464.0 |
| 334 | 0 | NaN | Self Enquiry | 1 | 14.0 | Salaried | Male | 3 | 3.0 | Deluxe | 3.0 | Divorced | 1.0 | 0 | 3 | 1 | 0.0 | Manager | 18697.0 |
| 337 | 0 | NaN | Self Enquiry | 1 | 15.0 | Salaried | Male | 1 | 4.0 | Basic | 3.0 | Single | 1.0 | 0 | 2 | 1 | 0.0 | Executive | 19778.0 |
| 364 | 0 | NaN | Self Enquiry | 1 | 16.0 | Small Business | Female | 3 | 3.0 | Basic | 5.0 | Divorced | 7.0 | 0 | 2 | 1 | 1.0 | Executive | 18452.0 |
| 373 | 0 | NaN | Self Enquiry | 1 | 6.0 | Small Business | Male | 2 | 3.0 | Basic | 4.0 | Married | 2.0 | 0 | 3 | 0 | 0.0 | Executive | 19829.0 |
| 389 | 0 | NaN | Self Enquiry | 1 | 16.0 | Salaried | Male | 2 | 3.0 | Deluxe | 4.0 | Single | 3.0 | 1 | 5 | 0 | 1.0 | Manager | 22889.0 |
| 391 | 0 | NaN | Self Enquiry | 1 | 8.0 | Small Business | Female | 3 | 4.0 | Deluxe | 3.0 | Divorced | 7.0 | 0 | 5 | 1 | 0.0 | Manager | 18448.0 |
| 405 | 1 | NaN | Self Enquiry | 1 | 9.0 | Small Business | Male | 3 | 3.0 | Basic | 5.0 | Divorced | 6.0 | 0 | 5 | 1 | 1.0 | Executive | 19829.0 |
| 431 | 0 | NaN | Self Enquiry | 1 | 14.0 | Salaried | Female | 2 | 3.0 | Deluxe | 3.0 | Single | 1.0 | 0 | 2 | 1 | 0.0 | Manager | 22476.0 |
| 436 | 1 | NaN | Self Enquiry | 1 | 16.0 | Small Business | Male | 2 | 5.0 | Basic | 3.0 | Married | 1.0 | 0 | 5 | 0 | 0.0 | Executive | 18408.0 |
| 443 | 1 | NaN | Company Invited | 1 | 10.0 | Large Business | Male | 2 | 4.0 | Basic | 3.0 | Single | 6.0 | 0 | 5 | 1 | 0.0 | Executive | 19894.0 |
| 444 | 0 | NaN | Self Enquiry | 3 | 8.0 | Small Business | Female | 2 | 3.0 | Deluxe | 3.0 | Divorced | 3.0 | 0 | 5 | 1 | 0.0 | Manager | 22639.0 |
| 449 | 0 | NaN | Company Invited | 1 | 14.0 | Salaried | Female | 2 | 3.0 | Basic | 3.0 | Divorced | 2.0 | 0 | 2 | 0 | 0.0 | Executive | 19932.0 |
| 481 | 0 | NaN | Self Enquiry | 1 | 6.0 | Salaried | Male | 2 | 4.0 | Basic | 3.0 | Divorced | 2.0 | 1 | 2 | 0 | 0.0 | Executive | 18622.0 |
| 483 | 0 | NaN | Self Enquiry | 1 | 31.0 | Salaried | Male | 2 | 4.0 | Deluxe | 3.0 | Single | 5.0 | 0 | 2 | 0 | 1.0 | Manager | 18681.0 |
| 488 | 0 | NaN | Self Enquiry | 1 | 8.0 | Salaried | Female | 2 | 3.0 | Deluxe | 3.0 | Divorced | 2.0 | 0 | 2 | 1 | 0.0 | Manager | 22476.0 |
| 496 | 0 | NaN | Self Enquiry | 3 | 28.0 | Large Business | Male | 2 | 3.0 | Basic | 3.0 | Single | 2.0 | 0 | 2 | 1 | 1.0 | Executive | 18447.0 |
| 518 | 0 | NaN | Self Enquiry | 3 | 13.0 | Small Business | Female | 2 | 4.0 | Deluxe | 3.0 | Single | 1.0 | 0 | 2 | 1 | 1.0 | Manager | 22639.0 |
| 539 | 0 | NaN | Self Enquiry | 3 | 14.0 | Small Business | Male | 2 | 3.0 | Basic | 4.0 | Divorced | 1.0 | 0 | 2 | 1 | 0.0 | Executive | 19829.0 |
| 543 | 0 | NaN | Company Invited | 1 | 30.0 | Small Business | Male | 2 | 5.0 | Basic | 3.0 | Single | 3.0 | 0 | 3 | 0 | 1.0 | Executive | 18708.0 |
| 565 | 0 | NaN | Self Enquiry | 1 | 16.0 | Small Business | Male | 3 | 2.0 | Basic | 3.0 | Single | 2.0 | 0 | 2 | 1 | 1.0 | Executive | 18505.0 |
| 581 | 0 | NaN | Self Enquiry | 1 | 6.0 | Salaried | Male | 2 | 5.0 | Basic | 3.0 | Divorced | 4.0 | 0 | 4 | 1 | 0.0 | Executive | 19778.0 |
| 613 | 0 | NaN | Self Enquiry | 2 | 9.0 | Salaried | Male | 2 | 3.0 | Basic | 3.0 | Divorced | 1.0 | 0 | 2 | 1 | 1.0 | Executive | 19778.0 |
| 618 | 0 | NaN | Self Enquiry | 1 | 8.0 | Small Business | Male | 3 | 1.0 | Basic | 5.0 | Single | 1.0 | 0 | 2 | 1 | 2.0 | Executive | 18424.0 |
| 619 | 0 | NaN | Self Enquiry | 3 | 6.0 | Small Business | Male | 2 | 1.0 | Deluxe | 5.0 | Married | 2.0 | 0 | 4 | 0 | 0.0 | Manager | 22697.0 |
| 623 | 0 | NaN | Company Invited | 1 | 11.0 | Salaried | Male | 3 | 4.0 | Basic | 3.0 | Married | 1.0 | 0 | 5 | 0 | 0.0 | Executive | 19778.0 |
| 634 | 0 | NaN | Self Enquiry | 3 | 9.0 | Salaried | Male | 3 | 3.0 | Deluxe | 5.0 | Divorced | 2.0 | 1 | 4 | 1 | 0.0 | Manager | 22889.0 |
| 639 | 0 | NaN | Self Enquiry | 1 | 6.0 | Large Business | Female | 2 | 3.0 | Basic | 5.0 | Divorced | 3.0 | 0 | 4 | 1 | 0.0 | Executive | 18580.0 |
| 666 | 1 | NaN | Self Enquiry | 1 | 9.0 | Salaried | Female | 2 | 3.0 | Deluxe | 3.0 | Divorced | 1.0 | 1 | 5 | 1 | 1.0 | Manager | 22476.0 |
| 676 | 0 | NaN | Self Enquiry | 1 | 27.0 | Salaried | Female | 2 | 3.0 | Deluxe | 3.0 | Divorced | 2.0 | 1 | 3 | 0 | 1.0 | Manager | 22476.0 |
| 691 | 0 | NaN | Company Invited | 1 | 15.0 | Small Business | Male | 2 | 3.0 | Basic | 4.0 | Married | 1.0 | 1 | 4 | 0 | 1.0 | Executive | 18617.0 |
| 712 | 0 | NaN | Self Enquiry | 1 | 19.0 | Salaried | Female | 2 | 3.0 | Basic | 3.0 | Single | 4.0 | 0 | 3 | 1 | 0.0 | Executive | 18452.0 |
| 718 | 0 | NaN | Company Invited | 1 | 29.0 | Salaried | Male | 2 | 3.0 | Deluxe | 5.0 | Divorced | 2.0 | 1 | 2 | 1 | 0.0 | Manager | 18633.0 |
| 719 | 0 | NaN | Self Enquiry | 3 | 10.0 | Salaried | Female | 2 | 3.0 | Deluxe | 3.0 | Single | 2.0 | 1 | 3 | 1 | 1.0 | Manager | 22476.0 |
| 725 | 1 | NaN | Self Enquiry | 1 | 20.0 | Salaried | Male | 2 | 4.0 | Basic | 4.0 | Married | 2.0 | 1 | 5 | 1 | 1.0 | Executive | 19778.0 |
| 726 | 0 | NaN | Company Invited | 1 | 6.0 | Salaried | Female | 3 | 3.0 | Deluxe | 5.0 | Divorced | 2.0 | 0 | 5 | 1 | 2.0 | Manager | 22476.0 |
| 740 | 0 | NaN | Self Enquiry | 1 | 16.0 | Salaried | Male | 2 | 3.0 | Basic | 3.0 | Married | 2.0 | 0 | 5 | 1 | 0.0 | Executive | 19778.0 |
| 756 | 0 | NaN | Company Invited | 1 | 35.0 | Small Business | Female | 3 | 3.0 | Basic | 3.0 | Single | 1.0 | 0 | 3 | 1 | 0.0 | Executive | 19859.0 |
| 767 | 0 | NaN | Self Enquiry | 1 | 9.0 | Salaried | Female | 2 | 3.0 | Deluxe | 4.0 | Single | 4.0 | 0 | 4 | 1 | 0.0 | Manager | 22476.0 |
| 781 | 0 | NaN | Self Enquiry | 1 | 6.0 | Small Business | Male | 2 | 4.0 | Basic | 5.0 | Divorced | 2.0 | 0 | 3 | 0 | 1.0 | Executive | 19829.0 |
| 783 | 0 | NaN | Self Enquiry | 1 | 13.0 | Large Business | Female | 2 | 1.0 | Basic | 3.0 | Divorced | 2.0 | 0 | 3 | 1 | 1.0 | Executive | 18376.0 |
| 788 | 0 | NaN | Self Enquiry | 1 | 16.0 | Salaried | Female | 2 | 3.0 | Deluxe | 3.0 | Single | 4.0 | 0 | 4 | 1 | 1.0 | Manager | 18660.0 |
| 796 | 1 | NaN | Self Enquiry | 1 | 10.0 | Large Business | Male | 2 | 3.0 | Basic | 3.0 | Divorced | 2.0 | 1 | 3 | 1 | 0.0 | Executive | 18691.0 |
| 822 | 0 | NaN | Company Invited | 1 | 8.0 | Salaried | Male | 3 | 3.0 | Deluxe | 3.0 | Single | 3.0 | 0 | 4 | 1 | 2.0 | Manager | 22889.0 |
| 841 | 0 | NaN | Self Enquiry | 1 | 30.0 | Small Business | Male | 2 | 3.0 | Basic | 4.0 | Single | 1.0 | 0 | 5 | 1 | 0.0 | Executive | 18597.0 |
| 863 | 0 | NaN | Self Enquiry | 2 | 8.0 | Salaried | Male | 2 | 3.0 | Basic | 3.0 | Divorced | 2.0 | 0 | 2 | 0 | 0.0 | Executive | 18600.0 |
| 865 | 0 | NaN | Self Enquiry | 3 | 35.0 | Salaried | Male | 3 | 3.0 | Deluxe | 5.0 | Married | 1.0 | 0 | 2 | 1 | 1.0 | Manager | 22889.0 |
| 874 | 0 | NaN | Self Enquiry | 1 | 13.0 | Salaried | Male | 2 | 3.0 | Deluxe | 3.0 | Married | 2.0 | 0 | 5 | 1 | 1.0 | Manager | 18491.0 |
| 886 | 0 | NaN | Self Enquiry | 1 | 6.0 | Small Business | Male | 3 | 3.0 | Basic | 4.0 | Divorced | 1.0 | 1 | 4 | 1 | 2.0 | Executive | 18579.0 |
| 893 | 0 | NaN | Self Enquiry | 1 | 6.0 | Salaried | Female | 3 | 3.0 | Basic | 3.0 | Married | 2.0 | 0 | 2 | 0 | 2.0 | Executive | 19932.0 |
| 900 | 0 | NaN | Company Invited | 1 | 9.0 | Large Business | Male | 2 | 3.0 | Basic | 4.0 | Divorced | 2.0 | 0 | 3 | 1 | 1.0 | Executive | 18692.0 |
| 924 | 0 | NaN | Self Enquiry | 1 | 12.0 | Salaried | Male | 3 | 3.0 | Basic | 3.0 | Divorced | 1.0 | 1 | 3 | 1 | 1.0 | Executive | 18506.0 |
| 929 | 0 | NaN | Company Invited | 1 | 8.0 | Salaried | Male | 2 | 4.0 | Basic | 3.0 | Divorced | 2.0 | 1 | 4 | 1 | 0.0 | Executive | 19778.0 |
| 940 | 1 | NaN | Self Enquiry | 1 | 29.0 | Small Business | Male | 3 | 3.0 | Basic | 5.0 | Single | 1.0 | 0 | 3 | 0 | 2.0 | Executive | 19829.0 |
| 943 | 0 | NaN | Self Enquiry | 2 | 6.0 | Salaried | Female | 2 | 3.0 | Basic | 5.0 | Single | 7.0 | 0 | 5 | 1 | 0.0 | Executive | 18423.0 |
| 957 | 0 | NaN | Company Invited | 1 | 22.0 | Salaried | Male | 3 | 3.0 | Basic | 3.0 | Married | 1.0 | 0 | 3 | 1 | 2.0 | Executive | 18544.0 |
| 960 | 0 | NaN | Company Invited | 3 | 6.0 | Small Business | Female | 3 | 3.0 | Deluxe | 5.0 | Married | 1.0 | 0 | 3 | 0 | 1.0 | Manager | 22639.0 |
| 965 | 0 | NaN | Self Enquiry | 1 | 25.0 | Small Business | Male | 3 | 3.0 | Basic | 3.0 | Divorced | 4.0 | 0 | 5 | 1 | 2.0 | Executive | 18669.0 |
| 991 | 0 | NaN | Self Enquiry | 3 | 8.0 | Small Business | Male | 2 | 3.0 | Deluxe | 4.0 | Divorced | 1.0 | 1 | 1 | 0 | 1.0 | Manager | 22697.0 |
| 995 | 0 | NaN | Self Enquiry | 1 | 12.0 | Small Business | Female | 3 | 4.0 | Deluxe | 3.0 | Single | 2.0 | 1 | 4 | 1 | 1.0 | Manager | 22639.0 |
| 998 | 0 | NaN | Self Enquiry | 1 | 8.0 | Small Business | Male | 2 | 4.0 | Basic | 3.0 | Single | 1.0 | 0 | 3 | 0 | 0.0 | Executive | 19829.0 |
| 1001 | 0 | NaN | Self Enquiry | 1 | 17.0 | Small Business | Female | 3 | 3.0 | Basic | 3.0 | Single | 5.0 | 0 | 3 | 0 | 1.0 | Executive | 18629.0 |
| 1004 | 0 | NaN | Self Enquiry | 1 | 13.0 | Salaried | Male | 3 | 1.0 | Basic | 5.0 | Single | 1.0 | 0 | 1 | 1 | 0.0 | Executive | 18578.0 |
| 1020 | 0 | NaN | Self Enquiry | 1 | 6.0 | Large Business | Male | 3 | 3.0 | Basic | 3.0 | Divorced | 1.0 | 0 | 3 | 1 | 0.0 | Executive | 18420.0 |
| 1022 | 0 | NaN | Company Invited | 1 | 11.0 | Large Business | Male | 2 | 1.0 | Basic | 3.0 | Single | 1.0 | 0 | 5 | 1 | 0.0 | Executive | 18500.0 |
| 1025 | 0 | NaN | Self Enquiry | 3 | 10.0 | Small Business | Female | 2 | 3.0 | Deluxe | 3.0 | Divorced | 2.0 | 1 | 5 | 1 | 0.0 | Manager | 22639.0 |
| 1029 | 0 | NaN | Company Invited | 1 | 15.0 | Salaried | Male | 2 | 3.0 | Deluxe | 3.0 | Married | 4.0 | 0 | 3 | 1 | 0.0 | Manager | 22889.0 |
| 1032 | 1 | NaN | Company Invited | 1 | 8.0 | Salaried | Female | 2 | 4.0 | Basic | 5.0 | Single | 3.0 | 1 | 3 | 0 | 0.0 | Executive | 18646.0 |
| 1036 | 1 | NaN | Company Invited | 1 | 8.0 | Salaried | Male | 3 | 3.0 | Basic | 3.0 | Divorced | 7.0 | 1 | 3 | 1 | 1.0 | Executive | 19778.0 |
| 1046 | 0 | NaN | Self Enquiry | 1 | 26.0 | Salaried | Male | 2 | 3.0 | Basic | 4.0 | Single | 2.0 | 0 | 5 | 1 | 1.0 | Executive | 18420.0 |
| 1050 | 0 | NaN | Company Invited | 1 | 15.0 | Small Business | Female | 3 | 3.0 | Basic | 3.0 | Single | 2.0 | 0 | 4 | 1 | 1.0 | Executive | 18673.0 |
| 1065 | 0 | NaN | Self Enquiry | 1 | 10.0 | Salaried | Male | 1 | 3.0 | Deluxe | 3.0 | Divorced | 1.0 | 1 | 4 | 0 | 0.0 | Manager | 22889.0 |
| 1066 | 0 | NaN | Self Enquiry | 1 | 10.0 | Small Business | Female | 2 | 4.0 | Basic | 4.0 | Divorced | 1.0 | 0 | 1 | 0 | 0.0 | Executive | 19859.0 |
| 1085 | 1 | NaN | Company Invited | 1 | 9.0 | Salaried | Female | 2 | 3.0 | Basic | 3.0 | Single | 2.0 | 0 | 4 | 1 | 1.0 | Executive | 19932.0 |
| 1098 | 0 | NaN | Company Invited | 1 | 14.0 | Salaried | Male | 3 | 3.0 | Deluxe | 3.0 | Married | 4.0 | 1 | 3 | 1 | 1.0 | Manager | 22889.0 |
| 1113 | 0 | NaN | Company Invited | 1 | 6.0 | Large Business | Male | 3 | 3.0 | Deluxe | 5.0 | Married | 5.0 | 0 | 4 | 1 | 2.0 | Manager | 22238.0 |
| 1120 | 0 | NaN | Self Enquiry | 3 | 22.0 | Salaried | Female | 2 | 3.0 | Deluxe | 4.0 | Single | 3.0 | 0 | 1 | 0 | 0.0 | Manager | 22476.0 |
| 1130 | 0 | NaN | Self Enquiry | 1 | 34.0 | Salaried | Male | 2 | 1.0 | Deluxe | 3.0 | Married | 2.0 | 0 | 5 | 1 | 0.0 | Manager | 18407.0 |
| 1149 | 0 | NaN | Self Enquiry | 1 | 25.0 | Salaried | Male | 3 | 4.0 | Basic | 5.0 | Married | 2.0 | 0 | 1 | 0 | 0.0 | Executive | 19778.0 |
| 1157 | 0 | NaN | Company Invited | 1 | 14.0 | Salaried | Female | 2 | 3.0 | Deluxe | 3.0 | Married | 2.0 | 0 | 5 | 1 | 1.0 | Manager | 22476.0 |
| 1163 | 0 | NaN | Self Enquiry | 1 | 16.0 | Small Business | Female | 3 | 3.0 | Basic | 4.0 | Married | 2.0 | 0 | 1 | 1 | 0.0 | Executive | 19859.0 |
| 1168 | 0 | NaN | Company Invited | 1 | 8.0 | Large Business | Female | 2 | 3.0 | Basic | 3.0 | Single | 2.0 | 1 | 3 | 1 | 1.0 | Executive | 20146.0 |
| 1169 | 0 | NaN | Self Enquiry | 1 | 14.0 | Small Business | Female | 2 | 1.0 | Basic | 3.0 | Married | 1.0 | 0 | 4 | 0 | 0.0 | Executive | 18517.0 |
| 1188 | 0 | NaN | Self Enquiry | 3 | 11.0 | Small Business | Male | 2 | 4.0 | Deluxe | 4.0 | Married | 2.0 | 1 | 5 | 0 | 0.0 | Manager | 22697.0 |
| 1201 | 1 | NaN | Self Enquiry | 1 | 14.0 | Small Business | Male | 3 | 4.0 | Basic | 3.0 | Single | 2.0 | 1 | 4 | 1 | 2.0 | Executive | 19829.0 |
| 1207 | 0 | NaN | Self Enquiry | 1 | 28.0 | Large Business | Male | 3 | 4.0 | Basic | 4.0 | Married | 6.0 | 0 | 1 | 1 | 1.0 | Executive | 18537.0 |
| 1208 | 0 | NaN | Self Enquiry | 1 | 11.0 | Small Business | Male | 2 | 4.0 | Deluxe | 3.0 | Married | 2.0 | 0 | 4 | 1 | 1.0 | Manager | 22697.0 |
| 1226 | 0 | NaN | Company Invited | 1 | 16.0 | Salaried | Male | 2 | 3.0 | Basic | 3.0 | Married | 7.0 | 0 | 1 | 0 | 0.0 | Executive | 18433.0 |
| 1227 | 0 | NaN | Self Enquiry | 1 | 8.0 | Salaried | Male | 3 | 3.0 | Basic | 3.0 | Married | 2.0 | 0 | 5 | 0 | 2.0 | Executive | 18477.0 |
| 1230 | 0 | NaN | Self Enquiry | 1 | 35.0 | Small Business | Male | 3 | 3.0 | Basic | 5.0 | Married | 2.0 | 0 | 4 | 1 | 1.0 | Executive | 19829.0 |
| 1248 | 0 | NaN | Self Enquiry | 1 | 14.0 | Small Business | Female | 3 | 3.0 | Basic | 3.0 | Single | 2.0 | 0 | 4 | 1 | 2.0 | Executive | 18445.0 |
| 1267 | 0 | NaN | Company Invited | 3 | 16.0 | Salaried | Male | 2 | 3.0 | Deluxe | 3.0 | Married | 2.0 | 1 | 1 | 1 | 1.0 | Manager | 22889.0 |
| 1269 | 0 | NaN | Self Enquiry | 2 | 8.0 | Salaried | Male | 3 | 3.0 | Basic | 3.0 | Single | 1.0 | 0 | 1 | 0 | 0.0 | Executive | 18539.0 |
| 1272 | 0 | NaN | Self Enquiry | 1 | 12.0 | Salaried | Female | 2 | 4.0 | Basic | 3.0 | Married | 2.0 | 0 | 3 | 0 | 1.0 | Executive | 18702.0 |
| 1276 | 0 | NaN | Self Enquiry | 3 | 15.0 | Small Business | Male | 2 | 4.0 | Deluxe | 4.0 | Married | 2.0 | 0 | 4 | 0 | 1.0 | Manager | 22697.0 |
| 1280 | 0 | NaN | Self Enquiry | 2 | 14.0 | Salaried | Male | 2 | 3.0 | Deluxe | 4.0 | Married | 3.0 | 0 | 3 | 1 | 1.0 | Manager | 22889.0 |
| 1286 | 0 | NaN | Self Enquiry | 1 | 8.0 | Salaried | Female | 2 | 3.0 | Basic | 5.0 | Married | 5.0 | 0 | 4 | 1 | 1.0 | Executive | 18377.0 |
| 1291 | 1 | NaN | Self Enquiry | 1 | 16.0 | Small Business | Male | 2 | 3.0 | Deluxe | 5.0 | Single | 2.0 | 0 | 4 | 0 | 0.0 | Manager | 22697.0 |
| 1292 | 0 | NaN | Company Invited | 3 | 26.0 | Salaried | Male | 2 | 3.0 | Deluxe | 3.0 | Married | 3.0 | 0 | 3 | 1 | 0.0 | Manager | 22889.0 |
| 1307 | 0 | NaN | Self Enquiry | 1 | 6.0 | Small Business | Female | 3 | 3.0 | Basic | 5.0 | Married | 2.0 | 0 | 1 | 1 | 0.0 | Executive | 18591.0 |
| 1318 | 0 | NaN | Company Invited | 1 | 26.0 | Small Business | Male | 2 | 3.0 | Basic | 3.0 | Married | 2.0 | 0 | 3 | 1 | 1.0 | Executive | 19829.0 |
| 1325 | 0 | NaN | Self Enquiry | 1 | 14.0 | Salaried | Male | 2 | 3.0 | Basic | 3.0 | Single | 5.0 | 0 | 1 | 1 | 1.0 | Executive | 19778.0 |
| 1328 | 0 | NaN | Self Enquiry | 3 | 29.0 | Small Business | Female | 2 | 4.0 | Deluxe | 3.0 | Married | 2.0 | 0 | 1 | 1 | 1.0 | Manager | 18540.0 |
| 1335 | 0 | NaN | Self Enquiry | 1 | 25.0 | Salaried | Male | 2 | 3.0 | Deluxe | 3.0 | Married | 1.0 | 0 | 5 | 1 | 0.0 | Manager | 22889.0 |
| 1341 | 0 | NaN | Self Enquiry | 1 | 26.0 | Salaried | Male | 2 | 3.0 | Deluxe | 3.0 | Married | 2.0 | 0 | 5 | 0 | 0.0 | Manager | 22889.0 |
| 1347 | 0 | NaN | Company Invited | 2 | 8.0 | Salaried | Male | 3 | 4.0 | Deluxe | 3.0 | Single | 2.0 | 0 | 3 | 1 | 0.0 | Manager | 22889.0 |
| 1360 | 0 | NaN | Self Enquiry | 1 | 10.0 | Small Business | Female | 3 | 1.0 | Basic | 3.0 | Married | 1.0 | 0 | 5 | 1 | 2.0 | Executive | 19859.0 |
| 1393 | 0 | NaN | Self Enquiry | 3 | 15.0 | Small Business | Male | 2 | 3.0 | Deluxe | 3.0 | Single | 2.0 | 0 | 1 | 0 | 1.0 | Manager | 22697.0 |
| 1412 | 0 | NaN | Self Enquiry | 1 | 6.0 | Small Business | Male | 3 | 3.0 | Basic | 4.0 | Married | 2.0 | 0 | 5 | 0 | 1.0 | Executive | 19829.0 |
| 1413 | 0 | NaN | Self Enquiry | 1 | 8.0 | Salaried | Male | 2 | 3.0 | Basic | 3.0 | Married | 7.0 | 1 | 5 | 0 | 0.0 | Executive | 19778.0 |
| 1423 | 0 | NaN | Self Enquiry | 1 | 6.0 | Salaried | Male | 3 | 3.0 | Basic | 3.0 | Single | 1.0 | 0 | 3 | 0 | 2.0 | Executive | 18375.0 |
| 1429 | 0 | NaN | Self Enquiry | 1 | 30.0 | Salaried | Male | 2 | 3.0 | Deluxe | 3.0 | Single | 1.0 | 0 | 4 | 1 | 0.0 | Manager | 22889.0 |
| 1434 | 0 | NaN | Company Invited | 3 | 26.0 | Salaried | Male | 2 | 3.0 | Basic | 3.0 | Married | 3.0 | 0 | 1 | 1 | 1.0 | Executive | 18482.0 |
| 1459 | 0 | NaN | Self Enquiry | 1 | 19.0 | Salaried | Male | 2 | 4.0 | Deluxe | 4.0 | Married | 5.0 | 1 | 4 | 1 | 0.0 | Manager | 22889.0 |
| 1460 | 0 | NaN | Self Enquiry | 1 | 34.0 | Small Business | Female | 3 | 4.0 | Basic | 5.0 | Single | 2.0 | 0 | 1 | 0 | 1.0 | Executive | 19859.0 |
| 1474 | 0 | NaN | Self Enquiry | 1 | 8.0 | Small Business | Male | 2 | 3.0 | Basic | 4.0 | Married | 1.0 | 0 | 5 | 1 | 1.0 | Executive | 18468.0 |
| 1481 | 0 | NaN | Self Enquiry | 1 | 21.0 | Salaried | Female | 2 | 4.0 | Deluxe | 3.0 | Single | 1.0 | 1 | 1 | 1 | 1.0 | Manager | 22476.0 |
| 1489 | 0 | NaN | Self Enquiry | 1 | 8.0 | Salaried | Male | 2 | 3.0 | Basic | 3.0 | Single | 6.0 | 1 | 1 | 0 | 0.0 | Executive | 19778.0 |
| 1490 | 0 | NaN | Company Invited | 1 | 17.0 | Salaried | Female | 3 | 1.0 | Deluxe | 3.0 | Married | 1.0 | 0 | 5 | 0 | 1.0 | Manager | 22476.0 |
| 1491 | 1 | NaN | Self Enquiry | 3 | 15.0 | Salaried | Male | 2 | 4.0 | Basic | 5.0 | Single | 1.0 | 0 | 1 | 0 | 0.0 | Executive | 18407.0 |
| 1496 | 1 | NaN | Company Invited | 1 | 22.0 | Salaried | Female | 3 | 5.0 | Basic | 5.0 | Single | 2.0 | 1 | 4 | 0 | 0.0 | Executive | 19932.0 |
| 1508 | 0 | NaN | Self Enquiry | 1 | 11.0 | Salaried | Female | 2 | 4.0 | Basic | 5.0 | Married | 1.0 | 1 | 1 | 1 | 1.0 | Executive | 18419.0 |
| 1514 | 0 | NaN | Company Invited | 1 | 6.0 | Small Business | Female | 2 | 3.0 | Deluxe | 3.0 | Single | 2.0 | 0 | 3 | 1 | 0.0 | Manager | 22639.0 |
| 1521 | 1 | NaN | Self Enquiry | 1 | 11.0 | Large Business | Male | 2 | 3.0 | Basic | 3.0 | Single | 2.0 | 1 | 1 | 0 | 0.0 | Executive | 18441.0 |
| 1524 | 0 | NaN | Self Enquiry | 3 | 29.0 | Small Business | Female | 2 | 4.0 | Deluxe | 3.0 | Married | 1.0 | 1 | 1 | 0 | 0.0 | Manager | 22639.0 |
| 1527 | 0 | NaN | Self Enquiry | 1 | 29.0 | Small Business | Female | 1 | 3.0 | Basic | 5.0 | Married | 4.0 | 1 | 4 | 0 | 0.0 | Executive | 19859.0 |
| 1539 | 1 | NaN | Self Enquiry | 1 | 15.0 | Small Business | Male | 3 | 4.0 | Basic | 3.0 | Married | 1.0 | 1 | 1 | 1 | 2.0 | Executive | 18388.0 |
| 1558 | 0 | NaN | Self Enquiry | 1 | 8.0 | Salaried | Male | 4 | 4.0 | Deluxe | 3.0 | Married | 2.0 | 0 | 4 | 1 | 1.0 | Manager | 22889.0 |
| 1610 | 1 | NaN | Self Enquiry | 1 | 15.0 | Small Business | Female | 2 | 3.0 | Basic | 5.0 | Single | 1.0 | 0 | 4 | 1 | 0.0 | Executive | 19859.0 |
| 1611 | 0 | NaN | Self Enquiry | 1 | 35.0 | Small Business | Male | 2 | 3.0 | Basic | 3.0 | Single | 6.0 | 0 | 1 | 1 | 0.0 | Executive | 18452.0 |
| 1650 | 0 | NaN | Self Enquiry | 1 | 18.0 | Small Business | Female | 3 | 3.0 | Basic | 3.0 | Married | 1.0 | 1 | 4 | 1 | 1.0 | Executive | 19859.0 |
| 1653 | 0 | NaN | Self Enquiry | 1 | 6.0 | Small Business | Male | 2 | 4.0 | Basic | 3.0 | Married | 3.0 | 0 | 3 | 1 | 0.0 | Executive | 18690.0 |
| 1665 | 0 | NaN | Self Enquiry | 1 | 27.0 | Salaried | Male | 3 | 1.0 | Basic | 5.0 | Married | 2.0 | 1 | 3 | 1 | 0.0 | Executive | 18564.0 |
| 1672 | 0 | NaN | Company Invited | 1 | 16.0 | Small Business | Male | 2 | 3.0 | Basic | 3.0 | Married | 2.0 | 1 | 3 | 1 | 1.0 | Executive | 19829.0 |
| 1708 | 0 | NaN | Self Enquiry | 3 | 10.0 | Salaried | Female | 2 | 3.0 | Basic | 4.0 | Married | 3.0 | 0 | 3 | 1 | 1.0 | Executive | 19932.0 |
| 1709 | 1 | NaN | Self Enquiry | 1 | 6.0 | Salaried | Male | 3 | 4.0 | Basic | 3.0 | Single | 1.0 | 1 | 5 | 0 | 2.0 | Executive | 19778.0 |
| 1718 | 0 | NaN | Self Enquiry | 1 | 6.0 | Small Business | Female | 2 | 4.0 | Basic | 5.0 | Married | 3.0 | 0 | 4 | 0 | 1.0 | Executive | 19859.0 |
| 1729 | 1 | NaN | Company Invited | 1 | 35.0 | Small Business | Male | 3 | 4.0 | Basic | 4.0 | Single | 1.0 | 0 | 3 | 0 | 2.0 | Executive | 18479.0 |
| 1734 | 1 | NaN | Self Enquiry | 1 | 8.0 | Salaried | Male | 3 | 3.0 | Basic | 3.0 | Single | 3.0 | 0 | 4 | 0 | 1.0 | Executive | 18485.0 |
| 1737 | 0 | NaN | Company Invited | 1 | 11.0 | Salaried | Male | 2 | 1.0 | Deluxe | 5.0 | Married | 2.0 | 0 | 5 | 1 | 1.0 | Manager | 22889.0 |
| 1768 | 0 | NaN | Company Invited | 1 | 24.0 | Salaried | Male | 2 | 1.0 | Basic | 3.0 | Married | 5.0 | 0 | 5 | 1 | 1.0 | Executive | 18688.0 |
| 1793 | 1 | NaN | Self Enquiry | 1 | 8.0 | Small Business | Male | 2 | 5.0 | Basic | 3.0 | Married | 6.0 | 1 | 3 | 1 | 1.0 | Executive | 18464.0 |
| 1804 | 0 | NaN | Self Enquiry | 1 | 14.0 | Salaried | Male | 3 | 3.0 | Deluxe | 3.0 | Married | 1.0 | 0 | 3 | 1 | 0.0 | Manager | 18697.0 |
| 1807 | 0 | NaN | Self Enquiry | 1 | 15.0 | Salaried | Male | 1 | 4.0 | Basic | 3.0 | Single | 1.0 | 0 | 1 | 1 | 0.0 | Executive | 19778.0 |
| 1834 | 0 | NaN | Self Enquiry | 1 | 16.0 | Small Business | Female | 3 | 3.0 | Basic | 5.0 | Married | 7.0 | 0 | 1 | 1 | 2.0 | Executive | 18452.0 |
| 1843 | 0 | NaN | Self Enquiry | 1 | 6.0 | Small Business | Male | 2 | 3.0 | Basic | 4.0 | Married | 2.0 | 0 | 3 | 0 | 0.0 | Executive | 19829.0 |
| 1859 | 0 | NaN | Self Enquiry | 1 | 16.0 | Salaried | Male | 2 | 3.0 | Deluxe | 4.0 | Single | 3.0 | 1 | 5 | 0 | 1.0 | Manager | 22889.0 |
| 1861 | 0 | NaN | Self Enquiry | 1 | 8.0 | Small Business | Female | 3 | 4.0 | Deluxe | 3.0 | Married | 7.0 | 0 | 5 | 1 | 1.0 | Manager | 18448.0 |
| 1875 | 1 | NaN | Self Enquiry | 1 | 9.0 | Small Business | Male | 3 | 3.0 | Basic | 5.0 | Married | 6.0 | 0 | 5 | 1 | 1.0 | Executive | 19829.0 |
| 1901 | 0 | NaN | Self Enquiry | 1 | 5.0 | Salaried | Female | 2 | 3.0 | Deluxe | 3.0 | Single | 1.0 | 0 | 1 | 0 | 0.0 | Manager | 22476.0 |
| 1906 | 1 | NaN | Self Enquiry | 1 | 16.0 | Small Business | Male | 2 | 5.0 | Basic | 3.0 | Married | 1.0 | 0 | 5 | 0 | 0.0 | Executive | 18408.0 |
| 1913 | 1 | NaN | Company Invited | 1 | 10.0 | Large Business | Male | 3 | 4.0 | Basic | 3.0 | Single | 6.0 | 0 | 5 | 0 | 0.0 | Executive | 19894.0 |
| 1914 | 0 | NaN | Self Enquiry | 3 | 8.0 | Small Business | Female | 2 | 3.0 | Deluxe | 3.0 | Married | 3.0 | 0 | 5 | 1 | 0.0 | Manager | 22639.0 |
| 1919 | 0 | NaN | Company Invited | 1 | 14.0 | Salaried | Female | 2 | 3.0 | Basic | 3.0 | Married | 2.0 | 0 | 1 | 1 | 1.0 | Executive | 19932.0 |
| 1951 | 0 | NaN | Self Enquiry | 1 | 6.0 | Salaried | Male | 2 | 4.0 | Basic | 3.0 | Married | 2.0 | 1 | 1 | 1 | 1.0 | Executive | 18622.0 |
| 1953 | 0 | NaN | Self Enquiry | 1 | 31.0 | Salaried | Male | 2 | 4.0 | Deluxe | 3.0 | Single | 5.0 | 0 | 1 | 1 | 0.0 | Manager | 18681.0 |
| 1958 | 0 | NaN | Self Enquiry | 1 | 8.0 | Salaried | Female | 2 | 3.0 | Deluxe | 3.0 | Married | 2.0 | 0 | 1 | 0 | 0.0 | Manager | 22476.0 |
| 1966 | 0 | NaN | Self Enquiry | 3 | 28.0 | Large Business | Male | 2 | 3.0 | Basic | 3.0 | Single | 2.0 | 0 | 1 | 0 | 0.0 | Executive | 18447.0 |
| 1984 | 1 | NaN | Company Invited | 1 | 9.0 | Salaried | Male | 3 | 3.0 | Basic | 5.0 | Single | 2.0 | 1 | 5 | 1 | 0.0 | Executive | 18348.0 |
| 1988 | 0 | NaN | Self Enquiry | 3 | 13.0 | Small Business | Female | 2 | 4.0 | Deluxe | 3.0 | Single | 1.0 | 0 | 1 | 1 | 0.0 | Manager | 22639.0 |
| 2009 | 0 | NaN | Self Enquiry | 3 | 14.0 | Small Business | Male | 2 | 3.0 | Basic | 4.0 | Married | 1.0 | 0 | 1 | 0 | 1.0 | Executive | 19829.0 |
| 2013 | 0 | NaN | Company Invited | 1 | 30.0 | Small Business | Male | 2 | 5.0 | Basic | 3.0 | Single | 3.0 | 0 | 3 | 0 | 0.0 | Executive | 18708.0 |
| 2035 | 0 | NaN | Self Enquiry | 1 | 16.0 | Small Business | Male | 3 | 1.0 | Basic | 3.0 | Single | 2.0 | 0 | 1 | 0 | 1.0 | Executive | 18505.0 |
| 2051 | 0 | NaN | Self Enquiry | 1 | 6.0 | Salaried | Male | 2 | 5.0 | Basic | 3.0 | Married | 4.0 | 0 | 4 | 1 | 0.0 | Executive | 19778.0 |
| 2083 | 0 | NaN | Self Enquiry | 2 | 9.0 | Salaried | Male | 2 | 3.0 | Basic | 3.0 | Married | 1.0 | 0 | 1 | 1 | 0.0 | Executive | 19778.0 |
| 2088 | 0 | NaN | Self Enquiry | 1 | 8.0 | Small Business | Male | 3 | 1.0 | Basic | 5.0 | Single | 1.0 | 0 | 1 | 0 | 1.0 | Executive | 18424.0 |
| 2089 | 0 | NaN | Self Enquiry | 3 | 6.0 | Small Business | Male | 2 | 1.0 | Deluxe | 5.0 | Married | 2.0 | 0 | 4 | 0 | 1.0 | Manager | 22697.0 |
| 2093 | 0 | NaN | Company Invited | 1 | 11.0 | Salaried | Male | 3 | 4.0 | Basic | 3.0 | Married | 1.0 | 0 | 5 | 0 | 0.0 | Executive | 19778.0 |
| 2104 | 0 | NaN | Self Enquiry | 3 | 9.0 | Salaried | Male | 3 | 3.0 | Deluxe | 5.0 | Married | 2.0 | 1 | 4 | 0 | 2.0 | Manager | 22889.0 |
| 2109 | 0 | NaN | Self Enquiry | 1 | 6.0 | Large Business | Female | 2 | 3.0 | Basic | 5.0 | Married | 3.0 | 0 | 4 | 0 | 0.0 | Executive | 18580.0 |
| 2136 | 1 | NaN | Self Enquiry | 1 | 9.0 | Salaried | Female | 2 | 3.0 | Deluxe | 3.0 | Married | 1.0 | 1 | 5 | 1 | 1.0 | Manager | 22476.0 |
| 2146 | 0 | NaN | Self Enquiry | 1 | 27.0 | Salaried | Female | 2 | 3.0 | Deluxe | 3.0 | Married | 2.0 | 1 | 3 | 0 | 0.0 | Manager | 22476.0 |
| 2161 | 0 | NaN | Company Invited | 1 | 15.0 | Small Business | Male | 2 | 3.0 | Basic | 4.0 | Married | 1.0 | 1 | 4 | 0 | 0.0 | Executive | 18617.0 |
| 2182 | 0 | NaN | Self Enquiry | 1 | 19.0 | Salaried | Female | 2 | 3.0 | Basic | 3.0 | Single | 4.0 | 0 | 3 | 1 | 0.0 | Executive | 18452.0 |
| 2188 | 0 | NaN | Company Invited | 1 | 29.0 | Salaried | Male | 2 | 3.0 | Deluxe | 5.0 | Married | 2.0 | 1 | 1 | 1 | 0.0 | Manager | 18633.0 |
| 2189 | 0 | NaN | Self Enquiry | 3 | 10.0 | Salaried | Female | 2 | 3.0 | Deluxe | 3.0 | Single | 2.0 | 1 | 3 | 1 | 1.0 | Manager | 22476.0 |
| 2195 | 1 | NaN | Self Enquiry | 1 | 20.0 | Salaried | Male | 2 | 4.0 | Basic | 4.0 | Married | 2.0 | 1 | 5 | 1 | 1.0 | Executive | 19778.0 |
| 2196 | 0 | NaN | Company Invited | 1 | 6.0 | Salaried | Female | 3 | 3.0 | Deluxe | 5.0 | Married | 2.0 | 0 | 5 | 1 | 0.0 | Manager | 22476.0 |
| 2210 | 0 | NaN | Self Enquiry | 1 | 16.0 | Salaried | Male | 2 | 3.0 | Basic | 3.0 | Married | 2.0 | 0 | 5 | 0 | 0.0 | Executive | 19778.0 |
| 2226 | 0 | NaN | Company Invited | 1 | 35.0 | Small Business | Female | 3 | 3.0 | Basic | 3.0 | Single | 1.0 | 0 | 3 | 1 | 2.0 | Executive | 19859.0 |
| 2237 | 0 | NaN | Self Enquiry | 1 | 9.0 | Salaried | Female | 2 | 3.0 | Deluxe | 4.0 | Single | 4.0 | 0 | 4 | 1 | 0.0 | Manager | 22476.0 |
| 2251 | 0 | NaN | Self Enquiry | 1 | 6.0 | Small Business | Male | 2 | 4.0 | Basic | 5.0 | Married | 2.0 | 0 | 3 | 1 | 1.0 | Executive | 19829.0 |
| 2253 | 0 | NaN | Self Enquiry | 1 | 13.0 | Large Business | Female | 2 | 1.0 | Basic | 3.0 | Married | 2.0 | 0 | 3 | 0 | 0.0 | Executive | 18376.0 |
| 2258 | 0 | NaN | Self Enquiry | 1 | 16.0 | Salaried | Female | 2 | 3.0 | Deluxe | 3.0 | Single | 4.0 | 0 | 4 | 0 | 1.0 | Manager | 18660.0 |
| 2266 | 1 | NaN | Self Enquiry | 1 | 10.0 | Large Business | Male | 2 | 3.0 | Basic | 3.0 | Married | 2.0 | 1 | 3 | 1 | 1.0 | Executive | 18691.0 |
| 2292 | 0 | NaN | Company Invited | 1 | 8.0 | Salaried | Male | 3 | 3.0 | Deluxe | 3.0 | Single | 3.0 | 0 | 4 | 1 | 1.0 | Manager | 22889.0 |
| 2311 | 0 | NaN | Self Enquiry | 1 | 30.0 | Small Business | Male | 2 | 3.0 | Basic | 4.0 | Single | 1.0 | 0 | 5 | 0 | 1.0 | Executive | 18597.0 |
| 2333 | 0 | NaN | Self Enquiry | 2 | 8.0 | Salaried | Male | 3 | 3.0 | Basic | 3.0 | Married | 2.0 | 0 | 1 | 1 | 2.0 | Executive | 18600.0 |
| 2335 | 0 | NaN | Self Enquiry | 3 | 35.0 | Salaried | Male | 3 | 3.0 | Deluxe | 5.0 | Married | 1.0 | 0 | 1 | 1 | 0.0 | Manager | 22889.0 |
| 2344 | 0 | NaN | Self Enquiry | 1 | 13.0 | Salaried | Male | 2 | 3.0 | Deluxe | 3.0 | Married | 2.0 | 0 | 5 | 1 | 1.0 | Manager | 18491.0 |
| 2356 | 0 | NaN | Self Enquiry | 1 | 7.0 | Small Business | Male | 3 | 3.0 | Basic | 4.0 | Married | 1.0 | 1 | 4 | 1 | 2.0 | Executive | 18579.0 |
| 2363 | 0 | NaN | Self Enquiry | 1 | 7.0 | Salaried | Female | 3 | 3.0 | Basic | 3.0 | Married | 2.0 | 0 | 1 | 1 | 2.0 | Executive | 19932.0 |
| 2370 | 0 | NaN | Company Invited | 1 | 9.0 | Large Business | Male | 2 | 3.0 | Basic | 4.0 | Married | 2.0 | 0 | 3 | 0 | 1.0 | Executive | 18692.0 |
| 2394 | 1 | NaN | Company Invited | 1 | 8.0 | Salaried | Female | 2 | 4.0 | Basic | 5.0 | Single | 3.0 | 1 | 3 | 0 | 0.0 | Executive | 18506.0 |
| 2399 | 1 | NaN | Company Invited | 3 | 19.0 | Large Business | Female | 2 | 3.0 | Deluxe | 4.0 | Single | 6.0 | 0 | 3 | 1 | 0.0 | Manager | 22101.0 |
| 2410 | 1 | NaN | Self Enquiry | 1 | 30.0 | Small Business | Male | 2 | 3.0 | Basic | 4.0 | Married | 2.0 | 1 | 1 | 0 | 0.0 | Executive | 19829.0 |
| 2413 | 1 | NaN | Self Enquiry | 3 | 21.0 | Small Business | Male | 2 | 5.0 | Deluxe | 3.0 | Married | 7.0 | 1 | 1 | 0 | 1.0 | Manager | 18423.0 |
| 2427 | 1 | NaN | Self Enquiry | 3 | 22.0 | Small Business | Male | 3 | 3.0 | Standard | 3.0 | Married | 3.0 | 0 | 5 | 0 | 1.0 | Senior Manager | 18544.0 |
| 2430 | 1 | NaN | Self Enquiry | 1 | 14.0 | Small Business | Female | 3 | 3.0 | Basic | 5.0 | Married | 2.0 | 1 | 3 | 0 | 2.0 | Executive | 19859.0 |
| 2435 | 1 | NaN | Self Enquiry | 2 | 26.0 | Small Business | Female | 3 | 3.0 | Basic | 4.0 | Married | 1.0 | 1 | 3 | 0 | 1.0 | Executive | 18669.0 |
# We'll impute these missing values one by one, by taking mean of Age considering Occupation, Designation and ProductPitched
df.groupby(["Occupation", "Designation", "ProductPitched"])["Age"].mean().round(0)
Occupation Designation ProductPitched
Free Lancer AVP Basic NaN
Deluxe NaN
King NaN
Standard NaN
Super Deluxe NaN
Executive Basic 38.0
Deluxe NaN
King NaN
Standard NaN
Super Deluxe NaN
Manager Basic NaN
Deluxe NaN
King NaN
Standard NaN
Super Deluxe NaN
Senior Manager Basic NaN
Deluxe NaN
King NaN
Standard NaN
Super Deluxe NaN
VP Basic NaN
Deluxe NaN
King NaN
Standard NaN
Super Deluxe NaN
Large Business AVP Basic NaN
Deluxe NaN
King NaN
Standard NaN
Super Deluxe 48.0
Executive Basic 32.0
Deluxe NaN
King NaN
Standard NaN
Super Deluxe NaN
Manager Basic NaN
Deluxe 38.0
King NaN
Standard NaN
Super Deluxe NaN
Senior Manager Basic NaN
Deluxe NaN
King NaN
Standard 42.0
Super Deluxe NaN
VP Basic NaN
Deluxe NaN
King 47.0
Standard NaN
Super Deluxe NaN
Salaried AVP Basic NaN
Deluxe NaN
King NaN
Standard NaN
Super Deluxe 47.0
Executive Basic 34.0
Deluxe NaN
King NaN
Standard NaN
Super Deluxe NaN
Manager Basic NaN
Deluxe 37.0
King NaN
Standard NaN
Super Deluxe NaN
Senior Manager Basic NaN
Deluxe NaN
King NaN
Standard 40.0
Super Deluxe NaN
VP Basic NaN
Deluxe NaN
King 48.0
Standard NaN
Super Deluxe NaN
Small Business AVP Basic NaN
Deluxe NaN
King NaN
Standard NaN
Super Deluxe 49.0
Executive Basic 33.0
Deluxe NaN
King NaN
Standard NaN
Super Deluxe NaN
Manager Basic NaN
Deluxe 37.0
King NaN
Standard NaN
Super Deluxe NaN
Senior Manager Basic NaN
Deluxe NaN
King NaN
Standard 41.0
Super Deluxe NaN
VP Basic NaN
Deluxe NaN
King 48.0
Standard NaN
Super Deluxe NaN
Name: Age, dtype: float64
# Impute missing values of Age
df["Age"] = df.groupby(["Occupation", "Designation", "ProductPitched"])[
"Age"
].transform(lambda x: round(x.fillna(x.mean())))
df[df["Age"].isnull()]
| ProdTaken | Age | TypeofContact | CityTier | DurationOfPitch | Occupation | Gender | NumberOfPersonVisiting | NumberOfFollowups | ProductPitched | PreferredPropertyStar | MaritalStatus | NumberOfTrips | Passport | PitchSatisfactionScore | OwnCar | NumberOfChildrenVisiting | Designation | MonthlyIncome |
|---|
# Cheeking the rows with NumberOfTrips missing
df[df["NumberOfTrips"].isnull()]
| ProdTaken | Age | TypeofContact | CityTier | DurationOfPitch | Occupation | Gender | NumberOfPersonVisiting | NumberOfFollowups | ProductPitched | PreferredPropertyStar | MaritalStatus | NumberOfTrips | Passport | PitchSatisfactionScore | OwnCar | NumberOfChildrenVisiting | Designation | MonthlyIncome | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2469 | 0 | 54.0 | Self Enquiry | 1 | 12.0 | Salaried | Female | 3 | 4.0 | King | 3.0 | Married | NaN | 0 | 5 | 0 | 2.0 | VP | 37772.0 |
| 2473 | 0 | 47.0 | Self Enquiry | 3 | 9.0 | Small Business | Female | 3 | 4.0 | King | 5.0 | Single | NaN | 0 | 2 | 0 | 2.0 | VP | 37625.0 |
| 2506 | 0 | 51.0 | Self Enquiry | 1 | 14.0 | Small Business | Female | 3 | 5.0 | King | 3.0 | Married | NaN | 1 | 4 | 1 | 2.0 | VP | 37418.0 |
| 2549 | 0 | 60.0 | Company Invited | 2 | 9.0 | Salaried | Female | 3 | 5.0 | King | 3.0 | Divorced | NaN | 0 | 2 | 1 | 2.0 | VP | 37522.0 |
| 2550 | 0 | 51.0 | Company Invited | 1 | 7.0 | Salaried | Female | 4 | 4.0 | King | 4.0 | Divorced | NaN | 1 | 2 | 0 | 3.0 | VP | 36850.0 |
| 2556 | 0 | 55.0 | Company Invited | 2 | 33.0 | Salaried | Female | 2 | 3.0 | Super Deluxe | 3.0 | Single | NaN | 1 | 3 | 1 | 1.0 | AVP | 36006.0 |
| 2563 | 0 | 44.0 | Company Invited | 3 | 33.0 | Salaried | Male | 4 | 4.0 | Super Deluxe | 3.0 | Divorced | NaN | 1 | 2 | 0 | 1.0 | AVP | 35637.0 |
| 2567 | 0 | 52.0 | Self Enquiry | 1 | 13.0 | Salaried | Male | 3 | 4.0 | King | 3.0 | Single | NaN | 0 | 5 | 1 | 1.0 | VP | 38215.0 |
| 2591 | 0 | 42.0 | Company Invited | 1 | 17.0 | Salaried | Male | 4 | 5.0 | Super Deluxe | 5.0 | Married | NaN | 0 | 2 | 1 | 1.0 | AVP | 35859.0 |
| 2630 | 0 | 41.0 | Self Enquiry | 1 | 11.0 | Small Business | Female | 3 | 4.0 | King | 3.0 | Divorced | NaN | 0 | 5 | 0 | 2.0 | VP | 37711.0 |
| 2631 | 0 | 56.0 | Self Enquiry | 1 | 21.0 | Small Business | Male | 4 | 3.0 | King | 4.0 | Single | NaN | 0 | 2 | 0 | 3.0 | VP | 37400.0 |
| 2675 | 0 | 43.0 | Self Enquiry | 1 | 11.0 | Large Business | Male | 3 | 4.0 | King | 3.0 | Divorced | NaN | 0 | 2 | 0 | 2.0 | VP | 37910.0 |
| 2677 | 0 | 51.0 | Self Enquiry | 3 | 7.0 | Small Business | Female | 4 | 4.0 | King | 3.0 | Married | NaN | 0 | 2 | 1 | 2.0 | VP | 38195.0 |
| 2681 | 0 | 53.0 | Company Invited | 3 | 9.0 | Salaried | Male | 4 | 5.0 | King | 3.0 | Single | NaN | 1 | 2 | 1 | 3.0 | VP | 37746.0 |
| 2688 | 0 | 46.0 | Self Enquiry | 1 | 7.0 | Salaried | Male | 4 | 3.0 | King | 3.0 | Divorced | NaN | 0 | 4 | 1 | 3.0 | VP | 37880.0 |
| 2701 | 0 | 41.0 | Self Enquiry | 1 | 9.0 | Small Business | Male | 3 | 4.0 | King | 3.0 | Married | NaN | 0 | 5 | 0 | 1.0 | VP | 38114.0 |
| 2714 | 0 | 56.0 | Self Enquiry | 1 | 7.0 | Small Business | Male | 3 | 4.0 | King | 5.0 | Single | NaN | 1 | 2 | 1 | 1.0 | VP | 37723.0 |
| 2723 | 0 | 51.0 | Self Enquiry | 1 | 11.0 | Salaried | Male | 4 | 4.0 | King | 4.0 | Married | NaN | 0 | 5 | 1 | 1.0 | VP | 37822.0 |
| 2724 | 0 | 54.0 | Self Enquiry | 1 | 10.0 | Small Business | Male | 3 | 4.0 | Super Deluxe | 3.0 | Divorced | NaN | 1 | 5 | 0 | 1.0 | AVP | 36262.0 |
| 2734 | 0 | 50.0 | Company Invited | 1 | 17.0 | Salaried | Female | 3 | 4.0 | King | 5.0 | Single | NaN | 1 | 3 | 0 | 1.0 | VP | 37343.0 |
| 2758 | 0 | 40.0 | Self Enquiry | 1 | 17.0 | Small Business | Male | 3 | 4.0 | Super Deluxe | 5.0 | Divorced | NaN | 1 | 3 | 1 | 2.0 | AVP | 35746.0 |
| 2770 | 0 | 40.0 | Company Invited | 1 | 6.0 | Small Business | Male | 3 | 5.0 | King | 3.0 | Divorced | NaN | 0 | 5 | 1 | 2.0 | VP | 37950.0 |
| 2773 | 0 | 48.0 | Self Enquiry | 1 | 12.0 | Salaried | Male | 3 | 4.0 | King | 3.0 | Divorced | NaN | 0 | 2 | 0 | 2.0 | VP | 36978.0 |
| 2836 | 0 | 55.0 | Self Enquiry | 1 | 12.0 | Small Business | Male | 3 | 4.0 | King | 5.0 | Divorced | NaN | 0 | 4 | 1 | 1.0 | VP | 38084.0 |
| 2844 | 0 | 40.0 | Company Invited | 1 | 7.0 | Salaried | Male | 3 | 4.0 | King | 3.0 | Married | NaN | 1 | 3 | 0 | 2.0 | VP | 37875.0 |
| 2861 | 0 | 41.0 | Self Enquiry | 3 | 9.0 | Salaried | Female | 4 | 4.0 | King | 3.0 | Divorced | NaN | 0 | 3 | 0 | 1.0 | VP | 36719.0 |
| 2869 | 0 | 51.0 | Self Enquiry | 1 | 36.0 | Salaried | Male | 3 | 5.0 | Super Deluxe | 3.0 | Divorced | NaN | 0 | 5 | 1 | 2.0 | AVP | 35724.0 |
| 2873 | 0 | 47.0 | Self Enquiry | 1 | 9.0 | Salaried | Male | 3 | 4.0 | Super Deluxe | 3.0 | Divorced | NaN | 0 | 5 | 0 | 1.0 | AVP | 36539.0 |
| 2917 | 0 | 50.0 | Self Enquiry | 1 | 25.0 | Salaried | Male | 3 | 5.0 | King | 3.0 | Married | NaN | 1 | 3 | 0 | 2.0 | VP | 38180.0 |
| 2921 | 0 | 51.0 | Company Invited | 2 | 10.0 | Small Business | Male | 3 | 4.0 | King | 4.0 | Divorced | NaN | 0 | 2 | 1 | 1.0 | VP | 36878.0 |
| 2941 | 0 | 45.0 | Self Enquiry | 1 | 10.0 | Salaried | Male | 3 | 4.0 | King | 3.0 | Divorced | NaN | 1 | 2 | 1 | 1.0 | VP | 38191.0 |
| 2979 | 0 | 42.0 | Self Enquiry | 2 | 17.0 | Salaried | Male | 4 | 5.0 | King | 3.0 | Married | NaN | 0 | 2 | 1 | 1.0 | VP | 37819.0 |
| 2982 | 0 | 42.0 | Self Enquiry | 2 | 7.0 | Salaried | Male | 3 | 5.0 | King | 3.0 | Divorced | NaN | 0 | 2 | 1 | 2.0 | VP | 37867.0 |
| 3028 | 0 | 43.0 | Company Invited | 1 | 15.0 | Salaried | Male | 4 | 4.0 | King | 3.0 | Married | NaN | 0 | 5 | 0 | 2.0 | VP | 37108.0 |
| 3032 | 0 | 51.0 | Self Enquiry | 1 | 9.0 | Small Business | Male | 4 | 4.0 | Super Deluxe | 3.0 | Divorced | NaN | 0 | 2 | 0 | 2.0 | AVP | 36317.0 |
| 3039 | 1 | 59.0 | Self Enquiry | 1 | 9.0 | Salaried | Male | 3 | 4.0 | King | 4.0 | Single | NaN | 1 | 3 | 1 | 2.0 | VP | 37924.0 |
| 3053 | 0 | 44.0 | Self Enquiry | 1 | 21.0 | Salaried | Male | 4 | 4.0 | Super Deluxe | 5.0 | Divorced | NaN | 0 | 5 | 1 | 1.0 | AVP | 35837.0 |
| 3097 | 0 | 51.0 | Company Invited | 1 | 9.0 | Salaried | Male | 4 | 4.0 | Super Deluxe | 5.0 | Married | NaN | 0 | 5 | 1 | 3.0 | AVP | 36602.0 |
| 3143 | 0 | 53.0 | Self Enquiry | 1 | 7.0 | Salaried | Male | 4 | 4.0 | Super Deluxe | 3.0 | Divorced | NaN | 0 | 2 | 1 | 2.0 | AVP | 35777.0 |
| 3154 | 0 | 34.0 | Company Invited | 3 | 24.0 | Salaried | Male | 3 | 4.0 | Super Deluxe | 3.0 | Single | NaN | 0 | 3 | 1 | 2.0 | AVP | 36122.0 |
| 3158 | 0 | 51.0 | Self Enquiry | 1 | 7.0 | Small Business | Male | 4 | 4.0 | Super Deluxe | 3.0 | Married | NaN | 0 | 5 | 0 | 3.0 | AVP | 36077.0 |
| 3160 | 0 | 42.0 | Company Invited | 1 | 16.0 | Small Business | Male | 4 | 4.0 | King | 3.0 | Married | NaN | 0 | 4 | 1 | 3.0 | VP | 38097.0 |
| 3185 | 0 | 43.0 | Self Enquiry | 3 | 12.0 | Small Business | Male | 3 | 4.0 | King | 3.0 | Divorced | NaN | 0 | 4 | 0 | 2.0 | VP | 36981.0 |
| 3199 | 0 | 46.0 | Self Enquiry | 3 | 18.0 | Salaried | Female | 3 | 4.0 | Super Deluxe | 3.0 | Divorced | NaN | 0 | 4 | 1 | 2.0 | AVP | 36328.0 |
| 3210 | 0 | 51.0 | Self Enquiry | 1 | 9.0 | Small Business | Male | 4 | 4.0 | King | 3.0 | Divorced | NaN | 1 | 2 | 1 | 2.0 | VP | 37915.0 |
| 3243 | 0 | 43.0 | Self Enquiry | 1 | 9.0 | Small Business | Male | 3 | 4.0 | Super Deluxe | 5.0 | Divorced | NaN | 0 | 3 | 1 | 2.0 | AVP | 36343.0 |
| 3254 | 0 | 47.0 | Self Enquiry | 3 | 10.0 | Small Business | Male | 3 | 4.0 | Super Deluxe | 3.0 | Divorced | NaN | 0 | 2 | 1 | 1.0 | AVP | 36143.0 |
| 3302 | 0 | 54.0 | Self Enquiry | 1 | 14.0 | Small Business | Female | 3 | 4.0 | King | 3.0 | Married | NaN | 0 | 2 | 1 | 1.0 | VP | 37284.0 |
| 3305 | 0 | 47.0 | Self Enquiry | 3 | 9.0 | Small Business | Female | 4 | 4.0 | Super Deluxe | 5.0 | Divorced | NaN | 0 | 2 | 1 | 1.0 | AVP | 35726.0 |
| 3311 | 0 | 51.0 | Company Invited | 1 | 9.0 | Small Business | Female | 2 | 4.0 | Super Deluxe | 5.0 | Divorced | NaN | 0 | 3 | 1 | 1.0 | AVP | 36534.0 |
| 3313 | 0 | 47.0 | Self Enquiry | 1 | 22.0 | Salaried | Male | 3 | 4.0 | King | 4.0 | Divorced | NaN | 0 | 3 | 1 | 2.0 | VP | 37759.0 |
| 3338 | 0 | 55.0 | Self Enquiry | 1 | 10.0 | Salaried | Male | 3 | 4.0 | Super Deluxe | 3.0 | Divorced | NaN | 0 | 5 | 1 | 2.0 | AVP | 36457.0 |
| 3343 | 0 | 50.0 | Self Enquiry | 1 | 11.0 | Small Business | Male | 3 | 5.0 | King | 3.0 | Divorced | NaN | 0 | 3 | 0 | 1.0 | VP | 37389.0 |
| 3348 | 0 | 49.0 | Self Enquiry | 1 | 7.0 | Salaried | Male | 4 | 5.0 | King | 3.0 | Single | NaN | 0 | 3 | 1 | 1.0 | VP | 36943.0 |
| 3351 | 0 | 45.0 | Self Enquiry | 3 | 12.0 | Small Business | Male | 3 | 4.0 | King | 4.0 | Divorced | NaN | 0 | 5 | 1 | 2.0 | VP | 36891.0 |
| 3357 | 1 | 46.0 | Self Enquiry | 3 | 9.0 | Small Business | Female | 4 | 6.0 | King | 4.0 | Single | NaN | 1 | 5 | 0 | 3.0 | VP | 37502.0 |
| 3360 | 0 | 47.0 | Self Enquiry | 3 | 11.0 | Small Business | Female | 3 | 5.0 | King | 4.0 | Divorced | NaN | 0 | 5 | 1 | 1.0 | VP | 37467.0 |
| 3366 | 0 | 45.0 | Self Enquiry | 1 | 11.0 | Salaried | Male | 4 | 2.0 | King | 5.0 | Married | NaN | 0 | 5 | 1 | 3.0 | VP | 37868.0 |
| 3380 | 0 | 46.0 | Company Invited | 1 | 32.0 | Small Business | Female | 3 | 4.0 | King | 4.0 | Single | NaN | 0 | 4 | 1 | 2.0 | VP | 36739.0 |
| 3381 | 0 | 40.0 | Self Enquiry | 1 | 20.0 | Small Business | Female | 4 | 5.0 | Super Deluxe | 4.0 | Married | NaN | 1 | 3 | 1 | 3.0 | AVP | 35801.0 |
| 3398 | 0 | 43.0 | Company Invited | 1 | 9.0 | Salaried | Male | 3 | 4.0 | Super Deluxe | 3.0 | Divorced | NaN | 1 | 4 | 0 | 2.0 | AVP | 36539.0 |
| 3399 | 0 | 56.0 | Self Enquiry | 1 | 9.0 | Small Business | Female | 3 | 6.0 | King | 3.0 | Divorced | NaN | 0 | 4 | 1 | 2.0 | VP | 37865.0 |
| 3452 | 0 | 55.0 | Self Enquiry | 1 | 7.0 | Small Business | Female | 3 | 4.0 | Super Deluxe | 3.0 | Single | NaN | 0 | 4 | 1 | 2.0 | AVP | 36006.0 |
| 3468 | 0 | 48.0 | Self Enquiry | 1 | 9.0 | Small Business | Female | 3 | 4.0 | Super Deluxe | 3.0 | Divorced | NaN | 0 | 5 | 1 | 1.0 | AVP | 35847.0 |
| 3499 | 0 | 35.0 | Company Invited | 1 | 22.0 | Small Business | Male | 4 | 4.0 | Super Deluxe | 5.0 | Married | NaN | 0 | 4 | 1 | 3.0 | AVP | 35685.0 |
| 3570 | 0 | 51.0 | Self Enquiry | 3 | 6.0 | Small Business | Male | 3 | 4.0 | King | 3.0 | Married | NaN | 1 | 5 | 0 | 2.0 | VP | 38009.0 |
| 3579 | 0 | 47.0 | Self Enquiry | 3 | 7.0 | Salaried | Male | 3 | 2.0 | Super Deluxe | 5.0 | Single | NaN | 0 | 4 | 1 | 2.0 | AVP | 36245.0 |
| 3584 | 0 | 45.0 | Self Enquiry | 1 | 14.0 | Small Business | Female | 3 | 4.0 | King | 3.0 | Married | NaN | 1 | 4 | 1 | 2.0 | VP | 37727.0 |
| 3628 | 0 | 55.0 | Self Enquiry | 1 | 29.0 | Small Business | Female | 4 | 4.0 | Super Deluxe | 3.0 | Married | NaN | 0 | 1 | 1 | 2.0 | AVP | 36104.0 |
| 3629 | 0 | 44.0 | Self Enquiry | 1 | 22.0 | Salaried | Male | 4 | 5.0 | Super Deluxe | 3.0 | Married | NaN | 0 | 3 | 1 | 2.0 | AVP | 36281.0 |
| 3708 | 0 | 56.0 | Self Enquiry | 1 | 9.0 | Small Business | Male | 3 | 5.0 | King | 5.0 | Married | NaN | 0 | 5 | 0 | 2.0 | VP | 37716.0 |
| 3721 | 0 | 47.0 | Self Enquiry | 1 | 9.0 | Small Business | Male | 3 | 4.0 | King | 3.0 | Married | NaN | 1 | 1 | 1 | 2.0 | VP | 38006.0 |
| 3774 | 0 | 44.0 | Self Enquiry | 1 | 13.0 | Small Business | Female | 3 | 5.0 | King | 3.0 | Married | NaN | 0 | 3 | 1 | 2.0 | VP | 38070.0 |
| 3795 | 0 | 49.0 | Company Invited | 1 | 29.0 | Small Business | Female | 3 | 4.0 | Super Deluxe | 3.0 | Married | NaN | 0 | 5 | 0 | 1.0 | AVP | 35852.0 |
| 3818 | 0 | 59.0 | Self Enquiry | 3 | 28.0 | Salaried | Female | 4 | 4.0 | Super Deluxe | 3.0 | Married | NaN | 1 | 5 | 1 | 1.0 | AVP | 36553.0 |
| 3821 | 0 | 50.0 | Company Invited | 1 | 9.0 | Salaried | Male | 3 | 4.0 | King | 3.0 | Married | NaN | 0 | 3 | 1 | 1.0 | VP | 37839.0 |
| 3881 | 0 | 40.0 | Company Invited | 1 | 16.0 | Salaried | Male | 3 | 4.0 | King | 3.0 | Single | NaN | 0 | 4 | 0 | 2.0 | VP | 38109.0 |
| 3887 | 0 | 43.0 | Self Enquiry | 1 | 9.0 | Salaried | Male | 3 | 4.0 | King | 3.0 | Married | NaN | 0 | 3 | 1 | 2.0 | VP | 37558.0 |
| 3939 | 0 | 54.0 | Self Enquiry | 1 | 12.0 | Salaried | Female | 3 | 4.0 | King | 3.0 | Married | NaN | 0 | 4 | 1 | 1.0 | VP | 37772.0 |
| 3943 | 0 | 47.0 | Self Enquiry | 3 | 9.0 | Small Business | Female | 3 | 4.0 | King | 5.0 | Single | NaN | 0 | 1 | 1 | 2.0 | VP | 37625.0 |
| 3976 | 0 | 51.0 | Self Enquiry | 1 | 14.0 | Small Business | Female | 3 | 5.0 | King | 3.0 | Married | NaN | 1 | 4 | 1 | 2.0 | VP | 37418.0 |
| 4019 | 0 | 60.0 | Company Invited | 2 | 9.0 | Salaried | Female | 3 | 5.0 | King | 3.0 | Married | NaN | 0 | 1 | 0 | 2.0 | VP | 37522.0 |
| 4020 | 0 | 51.0 | Company Invited | 1 | 7.0 | Salaried | Female | 4 | 4.0 | King | 4.0 | Married | NaN | 1 | 1 | 0 | 3.0 | VP | 36850.0 |
| 4026 | 0 | 55.0 | Company Invited | 2 | 33.0 | Salaried | Female | 2 | 2.0 | Super Deluxe | 3.0 | Single | NaN | 1 | 3 | 1 | 1.0 | AVP | 36006.0 |
| 4033 | 0 | 44.0 | Company Invited | 3 | 33.0 | Salaried | Male | 4 | 4.0 | Super Deluxe | 3.0 | Married | NaN | 1 | 1 | 1 | 1.0 | AVP | 35637.0 |
| 4061 | 0 | 42.0 | Company Invited | 1 | 17.0 | Salaried | Male | 4 | 5.0 | Super Deluxe | 5.0 | Married | NaN | 0 | 1 | 1 | 1.0 | AVP | 35859.0 |
| 4100 | 0 | 41.0 | Self Enquiry | 1 | 11.0 | Small Business | Female | 3 | 4.0 | King | 3.0 | Married | NaN | 0 | 5 | 1 | 2.0 | VP | 37711.0 |
| 4101 | 0 | 56.0 | Self Enquiry | 1 | 21.0 | Small Business | Male | 4 | 2.0 | King | 4.0 | Single | NaN | 0 | 1 | 1 | 2.0 | VP | 37400.0 |
| 4145 | 0 | 43.0 | Self Enquiry | 1 | 11.0 | Large Business | Male | 3 | 4.0 | King | 3.0 | Married | NaN | 0 | 1 | 0 | 1.0 | VP | 37910.0 |
| 4147 | 0 | 51.0 | Self Enquiry | 3 | 7.0 | Small Business | Female | 4 | 4.0 | King | 3.0 | Married | NaN | 0 | 1 | 1 | 1.0 | VP | 38195.0 |
| 4151 | 0 | 53.0 | Company Invited | 3 | 9.0 | Salaried | Male | 4 | 5.0 | King | 3.0 | Single | NaN | 1 | 1 | 1 | 1.0 | VP | 37746.0 |
| 4158 | 0 | 46.0 | Self Enquiry | 1 | 7.0 | Salaried | Male | 4 | 2.0 | King | 3.0 | Married | NaN | 0 | 4 | 1 | 2.0 | VP | 37880.0 |
| 4171 | 0 | 41.0 | Self Enquiry | 1 | 9.0 | Small Business | Male | 3 | 4.0 | King | 3.0 | Married | NaN | 0 | 5 | 1 | 1.0 | VP | 38114.0 |
| 4184 | 0 | 56.0 | Self Enquiry | 1 | 7.0 | Small Business | Male | 3 | 4.0 | King | 5.0 | Single | NaN | 1 | 1 | 1 | 2.0 | VP | 37723.0 |
| 4193 | 0 | 51.0 | Self Enquiry | 1 | 11.0 | Salaried | Male | 4 | 4.0 | King | 4.0 | Married | NaN | 0 | 5 | 0 | 2.0 | VP | 37822.0 |
| 4194 | 0 | 54.0 | Self Enquiry | 1 | 10.0 | Small Business | Male | 3 | 4.0 | Super Deluxe | 3.0 | Married | NaN | 1 | 5 | 1 | 1.0 | AVP | 36262.0 |
| 4204 | 0 | 50.0 | Company Invited | 1 | 17.0 | Salaried | Female | 3 | 4.0 | King | 5.0 | Single | NaN | 1 | 3 | 0 | 2.0 | VP | 37343.0 |
| 4228 | 0 | 40.0 | Self Enquiry | 1 | 17.0 | Small Business | Male | 4 | 4.0 | Super Deluxe | 5.0 | Married | NaN | 1 | 3 | 1 | 3.0 | AVP | 35746.0 |
| 4240 | 0 | 40.0 | Company Invited | 1 | 14.0 | Small Business | Male | 3 | 5.0 | King | 3.0 | Married | NaN | 0 | 5 | 1 | 1.0 | VP | 37950.0 |
| 4243 | 0 | 48.0 | Self Enquiry | 1 | 12.0 | Salaried | Male | 3 | 4.0 | King | 3.0 | Married | NaN | 0 | 1 | 1 | 1.0 | VP | 36978.0 |
| 4306 | 0 | 55.0 | Self Enquiry | 1 | 12.0 | Small Business | Male | 3 | 4.0 | King | 5.0 | Married | NaN | 0 | 4 | 1 | 1.0 | VP | 38084.0 |
| 4314 | 0 | 40.0 | Company Invited | 1 | 7.0 | Salaried | Male | 3 | 4.0 | King | 3.0 | Married | NaN | 1 | 3 | 0 | 1.0 | VP | 37875.0 |
| 4331 | 0 | 41.0 | Self Enquiry | 3 | 9.0 | Salaried | Female | 4 | 4.0 | King | 3.0 | Married | NaN | 0 | 3 | 1 | 1.0 | VP | 36719.0 |
| 4339 | 0 | 51.0 | Self Enquiry | 1 | 36.0 | Salaried | Male | 3 | 5.0 | Super Deluxe | 3.0 | Married | NaN | 0 | 5 | 1 | 2.0 | AVP | 35724.0 |
| 4343 | 0 | 47.0 | Self Enquiry | 1 | 9.0 | Salaried | Male | 3 | 4.0 | Super Deluxe | 3.0 | Married | NaN | 0 | 5 | 1 | 1.0 | AVP | 36539.0 |
| 4387 | 0 | 50.0 | Self Enquiry | 1 | 25.0 | Salaried | Male | 3 | 5.0 | King | 3.0 | Married | NaN | 1 | 3 | 1 | 1.0 | VP | 38180.0 |
| 4391 | 0 | 51.0 | Company Invited | 2 | 10.0 | Small Business | Male | 4 | 4.0 | King | 4.0 | Married | NaN | 0 | 1 | 1 | 3.0 | VP | 36878.0 |
| 4411 | 0 | 45.0 | Self Enquiry | 1 | 10.0 | Salaried | Male | 3 | 4.0 | King | 3.0 | Married | NaN | 1 | 1 | 0 | 2.0 | VP | 38191.0 |
| 4449 | 0 | 42.0 | Self Enquiry | 2 | 17.0 | Salaried | Male | 4 | 5.0 | King | 3.0 | Married | NaN | 0 | 1 | 1 | 2.0 | VP | 37819.0 |
| 4452 | 0 | 42.0 | Self Enquiry | 2 | 7.0 | Salaried | Male | 3 | 5.0 | King | 3.0 | Married | NaN | 0 | 1 | 1 | 2.0 | VP | 37867.0 |
| 4498 | 0 | 43.0 | Company Invited | 1 | 15.0 | Salaried | Male | 4 | 4.0 | King | 3.0 | Married | NaN | 0 | 5 | 1 | 2.0 | VP | 37108.0 |
| 4502 | 0 | 51.0 | Self Enquiry | 1 | 9.0 | Small Business | Male | 4 | 4.0 | Super Deluxe | 3.0 | Married | NaN | 0 | 1 | 0 | 2.0 | AVP | 36317.0 |
| 4509 | 1 | 59.0 | Self Enquiry | 1 | 9.0 | Salaried | Male | 3 | 4.0 | King | 4.0 | Single | NaN | 1 | 3 | 1 | 1.0 | VP | 37924.0 |
| 4523 | 0 | 44.0 | Self Enquiry | 1 | 21.0 | Salaried | Male | 4 | 4.0 | Super Deluxe | 5.0 | Married | NaN | 0 | 5 | 1 | 2.0 | AVP | 35837.0 |
| 4567 | 0 | 51.0 | Company Invited | 1 | 9.0 | Salaried | Male | 4 | 4.0 | Super Deluxe | 5.0 | Married | NaN | 0 | 5 | 1 | 2.0 | AVP | 36602.0 |
| 4613 | 0 | 53.0 | Self Enquiry | 1 | 7.0 | Salaried | Male | 4 | 4.0 | Super Deluxe | 3.0 | Married | NaN | 0 | 1 | 1 | 3.0 | AVP | 35777.0 |
| 4624 | 0 | 34.0 | Company Invited | 3 | 24.0 | Salaried | Male | 3 | 4.0 | Super Deluxe | 3.0 | Single | NaN | 0 | 3 | 1 | 1.0 | AVP | 36122.0 |
| 4628 | 0 | 51.0 | Self Enquiry | 1 | 7.0 | Small Business | Male | 4 | 4.0 | Super Deluxe | 3.0 | Married | NaN | 0 | 5 | 0 | 3.0 | AVP | 36077.0 |
| 4630 | 0 | 42.0 | Company Invited | 1 | 16.0 | Small Business | Male | 4 | 4.0 | King | 3.0 | Married | NaN | 0 | 4 | 0 | 3.0 | VP | 38097.0 |
| 4655 | 0 | 43.0 | Self Enquiry | 3 | 12.0 | Small Business | Male | 3 | 4.0 | King | 3.0 | Married | NaN | 0 | 4 | 0 | 2.0 | VP | 36981.0 |
| 4669 | 0 | 46.0 | Self Enquiry | 3 | 18.0 | Salaried | Female | 3 | 4.0 | Super Deluxe | 3.0 | Married | NaN | 0 | 4 | 0 | 2.0 | AVP | 36328.0 |
| 4680 | 0 | 51.0 | Self Enquiry | 1 | 9.0 | Small Business | Male | 4 | 4.0 | King | 3.0 | Married | NaN | 1 | 1 | 1 | 3.0 | VP | 37915.0 |
| 4713 | 0 | 43.0 | Self Enquiry | 1 | 9.0 | Small Business | Male | 3 | 4.0 | Super Deluxe | 5.0 | Married | NaN | 0 | 3 | 1 | 1.0 | AVP | 36343.0 |
| 4718 | 0 | 49.0 | Company Invited | 1 | 7.0 | Small Business | Male | 3 | 2.0 | Super Deluxe | 3.0 | Single | NaN | 0 | 1 | 0 | 1.0 | AVP | 35563.0 |
| 4724 | 0 | 47.0 | Self Enquiry | 3 | 10.0 | Small Business | Male | 3 | 4.0 | Super Deluxe | 3.0 | Married | NaN | 0 | 1 | 1 | 1.0 | AVP | 36143.0 |
| 4772 | 0 | 54.0 | Self Enquiry | 1 | 14.0 | Small Business | Female | 3 | 4.0 | King | 3.0 | Married | NaN | 0 | 1 | 1 | 2.0 | VP | 37284.0 |
| 4775 | 0 | 47.0 | Self Enquiry | 3 | 9.0 | Small Business | Female | 4 | 4.0 | Super Deluxe | 5.0 | Married | NaN | 0 | 1 | 1 | 2.0 | AVP | 35726.0 |
| 4781 | 0 | 51.0 | Company Invited | 1 | 9.0 | Small Business | Female | 2 | 4.0 | Super Deluxe | 5.0 | Married | NaN | 0 | 3 | 0 | 1.0 | AVP | 36534.0 |
| 4783 | 0 | 47.0 | Self Enquiry | 1 | 22.0 | Salaried | Male | 3 | 4.0 | King | 4.0 | Married | NaN | 0 | 3 | 1 | 2.0 | VP | 37759.0 |
| 4808 | 0 | 55.0 | Self Enquiry | 1 | 10.0 | Salaried | Male | 3 | 4.0 | Super Deluxe | 3.0 | Married | NaN | 0 | 5 | 0 | 1.0 | AVP | 36457.0 |
| 4813 | 0 | 50.0 | Self Enquiry | 1 | 11.0 | Small Business | Male | 3 | 5.0 | King | 3.0 | Married | NaN | 0 | 3 | 0 | 2.0 | VP | 37389.0 |
| 4818 | 1 | 49.0 | Self Enquiry | 1 | 22.0 | Small Business | Female | 4 | 4.0 | Standard | 3.0 | Married | NaN | 0 | 3 | 0 | 1.0 | Senior Manager | 36943.0 |
| 4821 | 1 | 45.0 | Self Enquiry | 1 | 30.0 | Small Business | Male | 4 | 4.0 | Basic | 5.0 | Single | NaN | 0 | 3 | 0 | 3.0 | Executive | 36891.0 |
| 4827 | 1 | 46.0 | Self Enquiry | 3 | 20.0 | Small Business | Male | 3 | 2.0 | Super Deluxe | 3.0 | Single | NaN | 1 | 3 | 1 | 1.0 | AVP | 37502.0 |
| 4830 | 1 | 47.0 | Self Enquiry | 1 | 29.0 | Small Business | Male | 4 | 4.0 | Deluxe | 3.0 | Married | NaN | 1 | 5 | 1 | 3.0 | Manager | 37467.0 |
| 4836 | 1 | 45.0 | Self Enquiry | 3 | 16.0 | Salaried | Male | 4 | 5.0 | Basic | 5.0 | Married | NaN | 0 | 1 | 1 | 3.0 | Executive | 37868.0 |
| 4850 | 1 | 46.0 | Self Enquiry | 3 | 8.0 | Salaried | Male | 4 | 5.0 | Deluxe | 5.0 | Married | NaN | 0 | 4 | 1 | 3.0 | Manager | 36739.0 |
| 4851 | 1 | 40.0 | Self Enquiry | 1 | 9.0 | Salaried | Female | 4 | 4.0 | Basic | 5.0 | Married | NaN | 1 | 1 | 1 | 1.0 | Executive | 35801.0 |
| 4868 | 1 | 43.0 | Company Invited | 2 | 15.0 | Salaried | Female | 4 | 5.0 | Basic | 3.0 | Married | NaN | 0 | 5 | 1 | 2.0 | Executive | 36539.0 |
| 4869 | 1 | 56.0 | Self Enquiry | 3 | 16.0 | Small Business | Female | 3 | 6.0 | Basic | 4.0 | Single | NaN | 0 | 1 | 1 | 2.0 | Executive | 37865.0 |
# Checking for the mean, considering that we could not find a good pattern on EDA for this variable
df.NumberOfTrips.mean()
3.236520640269587
# We'll impute these missing values one by one, by taking mean of NumberOfTrips for the particular Designation
df.groupby(["Designation"])["NumberOfTrips"].mean().round(0)
Designation AVP 4.0 Executive 3.0 Manager 3.0 Senior Manager 3.0 VP 3.0 Name: NumberOfTrips, dtype: float64
# Impute missing values of NumberOfTrips
df["NumberOfTrips"] = df.groupby(["Designation"])["NumberOfTrips"].transform(
lambda x: round(x.fillna(x.mean()))
)
df[df["NumberOfTrips"].isnull()]
| ProdTaken | Age | TypeofContact | CityTier | DurationOfPitch | Occupation | Gender | NumberOfPersonVisiting | NumberOfFollowups | ProductPitched | PreferredPropertyStar | MaritalStatus | NumberOfTrips | Passport | PitchSatisfactionScore | OwnCar | NumberOfChildrenVisiting | Designation | MonthlyIncome |
|---|
# Cheeking the rows with NumberOfChildrenVisiting missing
df[df["NumberOfChildrenVisiting"].isnull()]
| ProdTaken | Age | TypeofContact | CityTier | DurationOfPitch | Occupation | Gender | NumberOfPersonVisiting | NumberOfFollowups | ProductPitched | PreferredPropertyStar | MaritalStatus | NumberOfTrips | Passport | PitchSatisfactionScore | OwnCar | NumberOfChildrenVisiting | Designation | MonthlyIncome | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 165 | 0 | 50.0 | Self Enquiry | 1 | 17.0 | Salaried | Female | 2 | 3.0 | King | 4.0 | Single | 4.0 | 0 | 5 | 1 | NaN | VP | 34926.0 |
| 190 | 0 | 52.0 | Self Enquiry | 1 | 6.0 | Salaried | Male | 3 | 4.0 | King | 3.0 | Divorced | 1.0 | 0 | 2 | 1 | NaN | VP | 34999.0 |
| 568 | 1 | 55.0 | Self Enquiry | 1 | 8.0 | Small Business | Male | 3 | 3.0 | King | 5.0 | Divorced | 6.0 | 1 | 2 | 1 | NaN | VP | 34859.0 |
| 746 | 0 | 41.0 | Company Invited | 1 | 13.0 | Salaried | Female | 2 | 5.0 | King | 3.0 | Married | 2.0 | 0 | 4 | 1 | NaN | VP | 34973.0 |
| 749 | 1 | 52.0 | Self Enquiry | 3 | 8.0 | Small Business | Female | 2 | 5.0 | King | 3.0 | Divorced | 2.0 | 0 | 3 | 1 | NaN | VP | 34845.0 |
| 851 | 0 | 56.0 | Self Enquiry | 1 | 10.0 | Large Business | Female | 3 | 3.0 | King | 5.0 | Married | 5.0 | 0 | 4 | 0 | NaN | VP | 34943.0 |
| 898 | 0 | 43.0 | Self Enquiry | 1 | 9.0 | Salaried | Male | 3 | 5.0 | King | 3.0 | Divorced | 4.0 | 0 | 5 | 1 | NaN | VP | 34740.0 |
| 918 | 0 | 51.0 | Company Invited | 3 | 15.0 | Salaried | Male | 2 | 3.0 | King | 4.0 | Married | 5.0 | 1 | 4 | 1 | NaN | VP | 34847.0 |
| 956 | 0 | 56.0 | Self Enquiry | 2 | 14.0 | Salaried | Male | 2 | 3.0 | King | 4.0 | Single | 7.0 | 0 | 4 | 1 | NaN | VP | 34717.0 |
| 1009 | 0 | 58.0 | Self Enquiry | 1 | 6.0 | Small Business | Female | 3 | 3.0 | King | 5.0 | Divorced | 4.0 | 1 | 1 | 1 | NaN | VP | 34701.0 |
| 1154 | 0 | 47.0 | Self Enquiry | 2 | 32.0 | Salaried | Female | 3 | 3.0 | King | 3.0 | Married | 4.0 | 0 | 4 | 1 | NaN | VP | 34658.0 |
| 1242 | 0 | 40.0 | Self Enquiry | 3 | 13.0 | Small Business | Male | 2 | 3.0 | King | 4.0 | Single | 2.0 | 0 | 4 | 1 | NaN | VP | 34833.0 |
| 1331 | 0 | 48.0 | Self Enquiry | 1 | 16.0 | Salaried | Male | 3 | 4.0 | King | 4.0 | Married | 5.0 | 0 | 3 | 1 | NaN | VP | 34665.0 |
| 1401 | 0 | 55.0 | Self Enquiry | 2 | 32.0 | Salaried | Male | 3 | 1.0 | King | 4.0 | Married | 5.0 | 1 | 5 | 1 | NaN | VP | 34636.0 |
| 1635 | 0 | 50.0 | Self Enquiry | 1 | 17.0 | Salaried | Female | 2 | 3.0 | King | 4.0 | Single | 4.0 | 0 | 5 | 0 | NaN | VP | 34926.0 |
| 1660 | 0 | 52.0 | Self Enquiry | 1 | 6.0 | Salaried | Male | 3 | 4.0 | King | 3.0 | Married | 1.0 | 0 | 1 | 1 | NaN | VP | 34999.0 |
| 2038 | 1 | 55.0 | Self Enquiry | 1 | 8.0 | Small Business | Male | 3 | 3.0 | King | 5.0 | Married | 6.0 | 1 | 1 | 1 | NaN | VP | 34859.0 |
| 2216 | 0 | 41.0 | Company Invited | 1 | 13.0 | Salaried | Female | 2 | 5.0 | King | 3.0 | Married | 2.0 | 0 | 4 | 1 | NaN | VP | 34973.0 |
| 2219 | 1 | 52.0 | Self Enquiry | 3 | 8.0 | Small Business | Female | 2 | 5.0 | King | 3.0 | Married | 2.0 | 0 | 3 | 1 | NaN | VP | 34845.0 |
| 2321 | 0 | 56.0 | Self Enquiry | 1 | 10.0 | Large Business | Female | 3 | 3.0 | King | 5.0 | Married | 5.0 | 0 | 4 | 1 | NaN | VP | 34943.0 |
| 2368 | 0 | 43.0 | Self Enquiry | 1 | 9.0 | Salaried | Male | 3 | 5.0 | King | 3.0 | Married | 4.0 | 0 | 5 | 1 | NaN | VP | 34740.0 |
| 2388 | 1 | 51.0 | Company Invited | 1 | 34.0 | Salaried | Male | 3 | 4.0 | Deluxe | 5.0 | Single | 4.0 | 0 | 3 | 0 | NaN | Manager | 34847.0 |
| 2426 | 1 | 56.0 | Self Enquiry | 3 | 22.0 | Salaried | Female | 3 | 3.0 | Standard | 5.0 | Single | 3.0 | 1 | 5 | 1 | NaN | Senior Manager | 34717.0 |
| 2638 | 0 | 46.0 | Company Invited | 1 | 9.0 | Small Business | Male | 4 | 5.0 | Super Deluxe | 3.0 | Divorced | 2.0 | 0 | 4 | 1 | NaN | AVP | 35470.0 |
| 2679 | 0 | 44.0 | Self Enquiry | 3 | 23.0 | Small Business | Female | 4 | 4.0 | Super Deluxe | 3.0 | Divorced | 7.0 | 1 | 3 | 0 | NaN | AVP | 34742.0 |
| 2707 | 0 | 47.0 | Self Enquiry | 3 | 9.0 | Large Business | Female | 4 | 6.0 | Super Deluxe | 4.0 | Divorced | 5.0 | 1 | 5 | 1 | NaN | AVP | 35550.0 |
| 2744 | 0 | 42.0 | Self Enquiry | 3 | 9.0 | Salaried | Male | 3 | 4.0 | Super Deluxe | 4.0 | Single | 3.0 | 0 | 3 | 1 | NaN | AVP | 34693.0 |
| 2792 | 0 | 43.0 | Self Enquiry | 1 | 30.0 | Salaried | Female | 3 | 4.0 | Super Deluxe | 3.0 | Single | 4.0 | 0 | 3 | 0 | NaN | AVP | 34670.0 |
| 2823 | 0 | 56.0 | Self Enquiry | 1 | 9.0 | Salaried | Female | 4 | 4.0 | Super Deluxe | 3.0 | Single | 4.0 | 1 | 5 | 0 | NaN | AVP | 35337.0 |
| 2852 | 0 | 53.0 | Self Enquiry | 1 | 11.0 | Salaried | Female | 2 | 4.0 | Super Deluxe | 3.0 | Divorced | 4.0 | 0 | 5 | 1 | NaN | AVP | 35233.0 |
| 2889 | 0 | 56.0 | Self Enquiry | 3 | 25.0 | Salaried | Female | 3 | 4.0 | Super Deluxe | 4.0 | Single | 5.0 | 0 | 2 | 0 | NaN | AVP | 35513.0 |
| 2899 | 0 | 34.0 | Self Enquiry | 1 | 7.0 | Small Business | Female | 4 | 3.0 | Super Deluxe | 3.0 | Married | 6.0 | 0 | 2 | 0 | NaN | AVP | 34862.0 |
| 2910 | 0 | 42.0 | Self Enquiry | 3 | 9.0 | Salaried | Female | 4 | 4.0 | Super Deluxe | 5.0 | Divorced | 2.0 | 0 | 5 | 1 | NaN | AVP | 35273.0 |
| 2933 | 0 | 44.0 | Self Enquiry | 1 | 13.0 | Salaried | Male | 3 | 5.0 | Super Deluxe | 3.0 | Married | 6.0 | 1 | 3 | 1 | NaN | AVP | 35305.0 |
| 3005 | 0 | 53.0 | Self Enquiry | 3 | 10.0 | Small Business | Male | 3 | 5.0 | Super Deluxe | 5.0 | Divorced | 3.0 | 0 | 5 | 1 | NaN | AVP | 35534.0 |
| 3036 | 0 | 48.0 | Self Enquiry | 1 | 9.0 | Salaried | Female | 3 | 4.0 | Super Deluxe | 3.0 | Divorced | 3.0 | 1 | 4 | 1 | NaN | AVP | 35430.0 |
| 3060 | 0 | 52.0 | Self Enquiry | 3 | 33.0 | Small Business | Female | 4 | 4.0 | Super Deluxe | 3.0 | Divorced | 4.0 | 0 | 3 | 0 | NaN | AVP | 34985.0 |
| 3218 | 0 | 56.0 | Company Invited | 1 | 9.0 | Small Business | Male | 3 | 5.0 | Super Deluxe | 5.0 | Single | 2.0 | 0 | 3 | 1 | NaN | AVP | 35434.0 |
| 3349 | 0 | 30.0 | Self Enquiry | 1 | 7.0 | Salaried | Female | 3 | 5.0 | Super Deluxe | 3.0 | Married | 5.0 | 0 | 2 | 0 | NaN | AVP | 34802.0 |
| 3389 | 0 | 51.0 | Self Enquiry | 1 | 35.0 | Salaried | Female | 3 | 4.0 | Super Deluxe | 5.0 | Divorced | 6.0 | 1 | 5 | 0 | NaN | AVP | 35558.0 |
| 3443 | 0 | 43.0 | Self Enquiry | 2 | 17.0 | Salaried | Female | 3 | 4.0 | Super Deluxe | 5.0 | Divorced | 2.0 | 0 | 3 | 0 | NaN | AVP | 35477.0 |
| 3458 | 0 | 32.0 | Self Enquiry | 1 | 15.0 | Salaried | Female | 4 | 4.0 | Super Deluxe | 4.0 | Single | 5.0 | 0 | 1 | 1 | NaN | AVP | 35100.0 |
| 3487 | 0 | 54.0 | Self Enquiry | 1 | 9.0 | Small Business | Male | 3 | 2.0 | Super Deluxe | 4.0 | Single | 6.0 | 0 | 1 | 1 | NaN | AVP | 35276.0 |
| 3520 | 0 | 55.0 | Company Invited | 1 | 18.0 | Small Business | Female | 3 | 4.0 | Super Deluxe | 3.0 | Married | 5.0 | 0 | 3 | 1 | NaN | AVP | 34710.0 |
| 3522 | 0 | 45.0 | Self Enquiry | 1 | 35.0 | Salaried | Male | 3 | 4.0 | Super Deluxe | 5.0 | Divorced | 5.0 | 0 | 1 | 1 | NaN | AVP | 35006.0 |
| 3524 | 0 | 47.0 | Self Enquiry | 3 | 10.0 | Salaried | Female | 3 | 4.0 | Super Deluxe | 4.0 | Divorced | 2.0 | 0 | 4 | 1 | NaN | AVP | 35284.0 |
| 3540 | 0 | 41.0 | Self Enquiry | 2 | 13.0 | Small Business | Male | 3 | 4.0 | Super Deluxe | 3.0 | Single | 3.0 | 1 | 3 | 1 | NaN | AVP | 35115.0 |
| 3620 | 0 | 50.0 | Self Enquiry | 1 | 29.0 | Salaried | Female | 4 | 4.0 | Super Deluxe | 4.0 | Married | 5.0 | 0 | 4 | 0 | NaN | AVP | 35091.0 |
| 3638 | 0 | 48.0 | Self Enquiry | 3 | 9.0 | Salaried | Female | 3 | 2.0 | Super Deluxe | 4.0 | Married | 8.0 | 0 | 1 | 0 | NaN | AVP | 34650.0 |
| 3669 | 0 | 46.0 | Self Enquiry | 1 | 35.0 | Large Business | Female | 3 | 5.0 | Super Deluxe | 4.0 | Single | 3.0 | 0 | 3 | 1 | NaN | AVP | 35382.0 |
| 3792 | 0 | 41.0 | Self Enquiry | 1 | 7.0 | Salaried | Male | 4 | 4.0 | Super Deluxe | 5.0 | Married | 4.0 | 0 | 3 | 1 | NaN | AVP | 35501.0 |
| 4108 | 0 | 46.0 | Company Invited | 1 | 9.0 | Small Business | Male | 4 | 5.0 | Super Deluxe | 3.0 | Married | 2.0 | 0 | 4 | 1 | NaN | AVP | 35470.0 |
| 4149 | 0 | 44.0 | Self Enquiry | 3 | 23.0 | Small Business | Female | 4 | 4.0 | Super Deluxe | 3.0 | Married | 7.0 | 1 | 3 | 0 | NaN | AVP | 34742.0 |
| 4214 | 0 | 42.0 | Self Enquiry | 3 | 9.0 | Salaried | Male | 3 | 4.0 | Super Deluxe | 4.0 | Single | 3.0 | 0 | 3 | 1 | NaN | AVP | 34693.0 |
| 4262 | 0 | 43.0 | Self Enquiry | 1 | 30.0 | Salaried | Female | 3 | 4.0 | Super Deluxe | 3.0 | Single | 4.0 | 0 | 3 | 0 | NaN | AVP | 34670.0 |
| 4293 | 0 | 56.0 | Self Enquiry | 1 | 9.0 | Salaried | Female | 4 | 4.0 | Super Deluxe | 3.0 | Single | 4.0 | 1 | 5 | 0 | NaN | AVP | 35337.0 |
| 4322 | 0 | 53.0 | Self Enquiry | 1 | 11.0 | Salaried | Female | 2 | 4.0 | Super Deluxe | 3.0 | Married | 4.0 | 0 | 5 | 1 | NaN | AVP | 35233.0 |
| 4359 | 0 | 56.0 | Self Enquiry | 3 | 25.0 | Salaried | Female | 3 | 4.0 | Super Deluxe | 4.0 | Single | 5.0 | 0 | 1 | 1 | NaN | AVP | 35513.0 |
| 4369 | 0 | 34.0 | Self Enquiry | 1 | 7.0 | Small Business | Female | 4 | 2.0 | Super Deluxe | 3.0 | Married | 6.0 | 0 | 1 | 1 | NaN | AVP | 34862.0 |
| 4380 | 0 | 42.0 | Self Enquiry | 3 | 9.0 | Salaried | Female | 4 | 4.0 | Super Deluxe | 5.0 | Married | 2.0 | 0 | 5 | 1 | NaN | AVP | 35273.0 |
| 4403 | 0 | 44.0 | Self Enquiry | 1 | 13.0 | Salaried | Male | 3 | 5.0 | Super Deluxe | 3.0 | Married | 6.0 | 1 | 3 | 1 | NaN | AVP | 35305.0 |
| 4475 | 0 | 53.0 | Self Enquiry | 3 | 10.0 | Small Business | Male | 3 | 5.0 | Super Deluxe | 5.0 | Married | 3.0 | 0 | 5 | 1 | NaN | AVP | 35534.0 |
| 4506 | 0 | 48.0 | Self Enquiry | 1 | 9.0 | Salaried | Female | 3 | 4.0 | Super Deluxe | 3.0 | Married | 3.0 | 1 | 4 | 1 | NaN | AVP | 35430.0 |
| 4530 | 0 | 52.0 | Self Enquiry | 3 | 33.0 | Small Business | Female | 4 | 4.0 | Super Deluxe | 3.0 | Married | 4.0 | 0 | 3 | 1 | NaN | AVP | 34985.0 |
| 4688 | 0 | 56.0 | Company Invited | 1 | 9.0 | Small Business | Male | 3 | 5.0 | Super Deluxe | 5.0 | Single | 2.0 | 0 | 3 | 1 | NaN | AVP | 35434.0 |
| 4819 | 1 | 30.0 | Self Enquiry | 1 | 14.0 | Large Business | Female | 3 | 4.0 | Basic | 3.0 | Married | 5.0 | 1 | 4 | 1 | NaN | Executive | 34802.0 |
df.NumberOfChildrenVisiting.mean()
1.1872666943177106
# We'll impute these missing values one by one, by taking mean of NumberOfChildrenVisiting for NumberOfPersonVisiting and MaritalStatus
df.groupby(["NumberOfPersonVisiting", "MaritalStatus"])[
"NumberOfChildrenVisiting"
].mean().round(0)
NumberOfPersonVisiting MaritalStatus
1 Divorced 0.0
Married 0.0
Single 0.0
Unmarried 0.0
2 Divorced 0.0
Married 1.0
Single 0.0
Unmarried 1.0
3 Divorced 1.0
Married 1.0
Single 1.0
Unmarried 1.0
4 Divorced 2.0
Married 2.0
Single 2.0
Unmarried 2.0
5 Divorced NaN
Married 2.0
Single 2.0
Unmarried 2.0
Name: NumberOfChildrenVisiting, dtype: float64
# Impute missing values of NumberOfChildrenVisiting
df["NumberOfChildrenVisiting"] = df.groupby(
["NumberOfPersonVisiting", "MaritalStatus"]
)["NumberOfChildrenVisiting"].transform(lambda x: round(x.fillna(x.mean())))
df[df["NumberOfChildrenVisiting"].isnull()]
| ProdTaken | Age | TypeofContact | CityTier | DurationOfPitch | Occupation | Gender | NumberOfPersonVisiting | NumberOfFollowups | ProductPitched | PreferredPropertyStar | MaritalStatus | NumberOfTrips | Passport | PitchSatisfactionScore | OwnCar | NumberOfChildrenVisiting | Designation | MonthlyIncome |
|---|
# Cheeking the rows with NumberOfFollowups missing
df[df["NumberOfFollowups"].isnull()]
| ProdTaken | Age | TypeofContact | CityTier | DurationOfPitch | Occupation | Gender | NumberOfPersonVisiting | NumberOfFollowups | ProductPitched | PreferredPropertyStar | MaritalStatus | NumberOfTrips | Passport | PitchSatisfactionScore | OwnCar | NumberOfChildrenVisiting | Designation | MonthlyIncome | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 79 | 0 | 46.0 | Self Enquiry | 2 | 11.0 | Small Business | Male | 3 | NaN | Deluxe | 4.0 | Married | 1.0 | 1 | 5 | 0 | 1.0 | Manager | 20021.0 |
| 94 | 0 | 32.0 | Self Enquiry | 3 | 12.0 | Small Business | Male | 2 | NaN | Deluxe | 3.0 | Single | 2.0 | 0 | 5 | 1 | 0.0 | Manager | 20010.0 |
| 96 | 0 | 24.0 | Self Enquiry | 3 | 9.0 | Salaried | Female | 2 | NaN | Deluxe | 3.0 | Divorced | 1.0 | 0 | 4 | 1 | 0.0 | Manager | 19999.0 |
| 122 | 1 | 56.0 | Self Enquiry | 1 | 20.0 | Salaried | Female | 3 | NaN | Basic | 4.0 | Divorced | 1.0 | 1 | 5 | 1 | 1.0 | Executive | 19963.0 |
| 135 | 0 | 36.0 | Self Enquiry | 1 | 12.0 | Small Business | Male | 2 | NaN | Deluxe | 4.0 | Married | 7.0 | 0 | 4 | 1 | 1.0 | Manager | 19941.0 |
| 174 | 0 | 45.0 | Self Enquiry | 3 | 10.0 | Salaried | Female | 1 | NaN | Deluxe | 5.0 | Married | 5.0 | 1 | 4 | 0 | 0.0 | Manager | 20006.0 |
| 317 | 1 | 52.0 | Self Enquiry | 1 | 14.0 | Small Business | Male | 2 | NaN | Deluxe | 4.0 | Divorced | 3.0 | 0 | 2 | 1 | 1.0 | Manager | 19941.0 |
| 322 | 0 | 32.0 | Self Enquiry | 1 | 8.0 | Small Business | Female | 3 | NaN | Deluxe | 3.0 | Single | 1.0 | 0 | 3 | 1 | 2.0 | Manager | 20055.0 |
| 376 | 0 | 51.0 | Self Enquiry | 3 | 20.0 | Salaried | Female | 2 | NaN | Deluxe | 3.0 | Divorced | 5.0 | 0 | 3 | 0 | 1.0 | Manager | 19936.0 |
| 532 | 0 | 47.0 | Self Enquiry | 3 | 20.0 | Small Business | Male | 2 | NaN | Deluxe | 5.0 | Single | 3.0 | 0 | 2 | 0 | 1.0 | Manager | 19960.0 |
| 629 | 0 | 28.0 | Self Enquiry | 2 | 14.0 | Small Business | Male | 3 | NaN | Basic | 3.0 | Married | 2.0 | 0 | 2 | 0 | 1.0 | Executive | 19936.0 |
| 737 | 0 | 41.0 | Self Enquiry | 1 | 13.0 | Small Business | Female | 2 | NaN | Deluxe | 3.0 | Single | 7.0 | 0 | 3 | 1 | 0.0 | Manager | 20003.0 |
| 748 | 1 | 26.0 | Company Invited | 3 | 35.0 | Small Business | Male | 3 | NaN | Deluxe | 5.0 | Single | 1.0 | 0 | 3 | 0 | 0.0 | Manager | 19969.0 |
| 820 | 0 | 35.0 | Company Invited | 3 | 17.0 | Small Business | Male | 2 | NaN | Deluxe | 3.0 | Married | 2.0 | 0 | 5 | 0 | 0.0 | Manager | 19968.0 |
| 881 | 0 | 32.0 | Company Invited | 1 | 8.0 | Salaried | Female | 2 | NaN | Deluxe | 3.0 | Single | 5.0 | 1 | 5 | 1 | 1.0 | Manager | 19998.0 |
| 885 | 0 | 25.0 | Self Enquiry | 3 | 16.0 | Salaried | Male | 2 | NaN | Deluxe | 3.0 | Single | 1.0 | 0 | 2 | 0 | 1.0 | Manager | 19950.0 |
| 1159 | 0 | 39.0 | Company Invited | 1 | 10.0 | Small Business | Female | 2 | NaN | Deluxe | 3.0 | Single | 1.0 | 0 | 3 | 1 | 0.0 | Manager | 20042.0 |
| 1215 | 0 | 35.0 | Company Invited | 1 | 8.0 | Small Business | Male | 3 | NaN | Basic | 3.0 | Single | 1.0 | 1 | 3 | 1 | 0.0 | Executive | 19930.0 |
| 1234 | 0 | 47.0 | Company Invited | 3 | 8.0 | Small Business | Male | 2 | NaN | Deluxe | 4.0 | Married | 1.0 | 0 | 1 | 0 | 1.0 | Manager | 19978.0 |
| 1244 | 0 | 30.0 | Company Invited | 1 | 8.0 | Large Business | Female | 3 | NaN | Basic | 5.0 | Single | 1.0 | 0 | 3 | 0 | 1.0 | Executive | 19968.0 |
| 1352 | 0 | 44.0 | Self Enquiry | 1 | 6.0 | Salaried | Male | 3 | NaN | Deluxe | 5.0 | Married | 3.0 | 0 | 3 | 1 | 2.0 | Manager | 20033.0 |
| 1389 | 0 | 31.0 | Company Invited | 1 | 6.0 | Salaried | Male | 2 | NaN | Deluxe | 5.0 | Married | 2.0 | 0 | 1 | 1 | 1.0 | Manager | 20003.0 |
| 1549 | 0 | 46.0 | Self Enquiry | 2 | 11.0 | Small Business | Male | 3 | NaN | Deluxe | 4.0 | Married | 1.0 | 1 | 5 | 1 | 2.0 | Manager | 20021.0 |
| 1564 | 0 | 32.0 | Self Enquiry | 3 | 12.0 | Small Business | Male | 2 | NaN | Deluxe | 3.0 | Single | 2.0 | 0 | 5 | 1 | 1.0 | Manager | 20010.0 |
| 1566 | 0 | 24.0 | Self Enquiry | 3 | 9.0 | Salaried | Female | 2 | NaN | Deluxe | 3.0 | Married | 1.0 | 0 | 4 | 0 | 0.0 | Manager | 19999.0 |
| 1592 | 1 | 56.0 | Self Enquiry | 1 | 20.0 | Salaried | Female | 3 | NaN | Basic | 4.0 | Married | 1.0 | 1 | 5 | 1 | 0.0 | Executive | 19963.0 |
| 1605 | 0 | 36.0 | Self Enquiry | 1 | 12.0 | Small Business | Male | 2 | NaN | Deluxe | 4.0 | Married | 7.0 | 0 | 4 | 0 | 1.0 | Manager | 19941.0 |
| 1644 | 0 | 45.0 | Self Enquiry | 3 | 10.0 | Salaried | Female | 1 | NaN | Deluxe | 5.0 | Married | 5.0 | 1 | 4 | 0 | 0.0 | Manager | 20006.0 |
| 1787 | 1 | 52.0 | Self Enquiry | 1 | 14.0 | Small Business | Male | 2 | NaN | Deluxe | 4.0 | Married | 3.0 | 0 | 1 | 1 | 0.0 | Manager | 19941.0 |
| 1846 | 0 | 51.0 | Self Enquiry | 3 | 20.0 | Salaried | Female | 2 | NaN | Deluxe | 3.0 | Married | 5.0 | 0 | 3 | 1 | 0.0 | Manager | 19936.0 |
| 2002 | 0 | 47.0 | Self Enquiry | 3 | 20.0 | Small Business | Male | 2 | NaN | Deluxe | 5.0 | Single | 3.0 | 0 | 1 | 1 | 1.0 | Manager | 19960.0 |
| 2099 | 0 | 28.0 | Self Enquiry | 2 | 14.0 | Small Business | Male | 3 | NaN | Basic | 3.0 | Married | 2.0 | 0 | 1 | 1 | 0.0 | Executive | 19936.0 |
| 2129 | 0 | 28.0 | Self Enquiry | 3 | 11.0 | Small Business | Male | 3 | NaN | Deluxe | 3.0 | Single | 2.0 | 0 | 3 | 0 | 1.0 | Manager | 19908.0 |
| 2207 | 0 | 41.0 | Self Enquiry | 1 | 13.0 | Small Business | Female | 2 | NaN | Deluxe | 3.0 | Single | 7.0 | 0 | 3 | 1 | 1.0 | Manager | 20003.0 |
| 2218 | 1 | 26.0 | Company Invited | 3 | 35.0 | Small Business | Male | 3 | NaN | Deluxe | 5.0 | Single | 1.0 | 0 | 3 | 1 | 1.0 | Manager | 19969.0 |
| 2290 | 0 | 35.0 | Company Invited | 3 | 17.0 | Small Business | Male | 2 | NaN | Deluxe | 3.0 | Married | 2.0 | 0 | 5 | 1 | 1.0 | Manager | 19968.0 |
| 2351 | 0 | 32.0 | Company Invited | 1 | 8.0 | Salaried | Female | 2 | NaN | Deluxe | 3.0 | Single | 5.0 | 1 | 5 | 1 | 0.0 | Manager | 19998.0 |
| 2355 | 0 | 25.0 | Self Enquiry | 3 | 16.0 | Salaried | Male | 2 | NaN | Deluxe | 3.0 | Single | 1.0 | 0 | 1 | 1 | 0.0 | Manager | 19950.0 |
| 2467 | 0 | 22.0 | Self Enquiry | 1 | 22.0 | Salaried | Male | 4 | NaN | Basic | 3.0 | Single | 3.0 | 0 | 3 | 1 | 2.0 | Executive | 19910.0 |
| 2959 | 0 | 36.0 | Company Invited | 1 | 10.0 | Salaried | Male | 3 | NaN | Basic | 3.0 | Divorced | 3.0 | 0 | 2 | 1 | 2.0 | Executive | 19959.0 |
| 3456 | 1 | 32.0 | Company Invited | 3 | 7.0 | Salaried | Female | 3 | NaN | Basic | 3.0 | Single | 3.0 | 0 | 3 | 1 | 2.0 | Executive | 20037.0 |
| 3460 | 1 | 32.0 | Self Enquiry | 1 | 15.0 | Salaried | Female | 3 | NaN | Basic | 4.0 | Single | 3.0 | 0 | 4 | 0 | 2.0 | Executive | 19939.0 |
| 3496 | 0 | 31.0 | Company Invited | 1 | 14.0 | Large Business | Male | 4 | NaN | Basic | 3.0 | Married | 3.0 | 0 | 5 | 0 | 1.0 | Executive | 19952.0 |
| 3937 | 0 | 22.0 | Self Enquiry | 1 | 22.0 | Salaried | Male | 4 | NaN | Basic | 3.0 | Single | 3.0 | 0 | 5 | 1 | 2.0 | Executive | 19910.0 |
| 4429 | 0 | 36.0 | Company Invited | 1 | 10.0 | Salaried | Male | 3 | NaN | Basic | 3.0 | Married | 3.0 | 0 | 1 | 1 | 2.0 | Executive | 19959.0 |
# We'll impute these missing values one by one, by taking mean of NumberOfFollowups for the particular Designation and DurationOfPitch
df.groupby(["Designation", "DurationOfPitch"])["NumberOfFollowups"].mean().round(0)
Designation DurationOfPitch
AVP 5.0 NaN
6.0 3.0
7.0 4.0
8.0 3.0
9.0 4.0
10.0 4.0
11.0 4.0
12.0 3.0
13.0 4.0
14.0 3.0
15.0 4.0
16.0 4.0
17.0 4.0
18.0 4.0
19.0 3.0
20.0 3.0
21.0 4.0
22.0 4.0
23.0 4.0
24.0 4.0
25.0 4.0
26.0 3.0
27.0 4.0
28.0 4.0
29.0 4.0
30.0 3.0
31.0 4.0
32.0 3.0
33.0 4.0
34.0 2.0
35.0 4.0
36.0 5.0
Executive 5.0 4.0
6.0 3.0
7.0 4.0
8.0 3.0
9.0 4.0
10.0 4.0
11.0 3.0
12.0 4.0
13.0 4.0
14.0 3.0
15.0 4.0
16.0 4.0
17.0 4.0
18.0 3.0
19.0 4.0
20.0 4.0
21.0 4.0
22.0 3.0
23.0 4.0
24.0 4.0
25.0 3.0
26.0 3.0
27.0 3.0
28.0 4.0
29.0 4.0
30.0 4.0
31.0 4.0
32.0 4.0
33.0 4.0
34.0 3.0
35.0 3.0
36.0 4.0
Manager 5.0 3.0
6.0 3.0
7.0 4.0
8.0 3.0
9.0 4.0
10.0 4.0
11.0 4.0
12.0 4.0
13.0 4.0
14.0 4.0
15.0 4.0
16.0 4.0
17.0 4.0
18.0 4.0
19.0 4.0
20.0 4.0
21.0 4.0
22.0 4.0
23.0 4.0
24.0 3.0
25.0 4.0
26.0 3.0
27.0 4.0
28.0 4.0
29.0 4.0
30.0 4.0
31.0 4.0
32.0 4.0
33.0 4.0
34.0 4.0
35.0 4.0
36.0 5.0
Senior Manager 5.0 3.0
6.0 3.0
7.0 4.0
8.0 3.0
9.0 4.0
10.0 4.0
11.0 4.0
12.0 4.0
13.0 4.0
14.0 4.0
15.0 4.0
16.0 4.0
17.0 4.0
18.0 4.0
19.0 4.0
20.0 3.0
21.0 4.0
22.0 4.0
23.0 4.0
24.0 4.0
25.0 4.0
26.0 4.0
27.0 4.0
28.0 2.0
29.0 3.0
30.0 4.0
31.0 4.0
32.0 4.0
33.0 3.0
34.0 4.0
35.0 4.0
36.0 4.0
VP 5.0 4.0
6.0 3.0
7.0 4.0
8.0 4.0
9.0 4.0
10.0 4.0
11.0 4.0
12.0 4.0
13.0 4.0
14.0 4.0
15.0 4.0
16.0 4.0
17.0 4.0
18.0 4.0
19.0 4.0
20.0 3.0
21.0 4.0
22.0 4.0
23.0 NaN
24.0 4.0
25.0 5.0
26.0 NaN
27.0 NaN
28.0 NaN
29.0 NaN
30.0 NaN
31.0 3.0
32.0 3.0
33.0 4.0
34.0 NaN
35.0 NaN
36.0 NaN
Name: NumberOfFollowups, dtype: float64
# Impute missing values of NumberOfFollowups
df["NumberOfFollowups"] = df.groupby(["Designation", "DurationOfPitch"])[
"NumberOfFollowups"
].transform(lambda x: round(x.fillna(x.mean())))
df[df["NumberOfFollowups"].isnull()]
| ProdTaken | Age | TypeofContact | CityTier | DurationOfPitch | Occupation | Gender | NumberOfPersonVisiting | NumberOfFollowups | ProductPitched | PreferredPropertyStar | MaritalStatus | NumberOfTrips | Passport | PitchSatisfactionScore | OwnCar | NumberOfChildrenVisiting | Designation | MonthlyIncome |
|---|
# Cheeking the rows with PreferredPropertyStar missing
df[df["PreferredPropertyStar"].isnull()]
| ProdTaken | Age | TypeofContact | CityTier | DurationOfPitch | Occupation | Gender | NumberOfPersonVisiting | NumberOfFollowups | ProductPitched | PreferredPropertyStar | MaritalStatus | NumberOfTrips | Passport | PitchSatisfactionScore | OwnCar | NumberOfChildrenVisiting | Designation | MonthlyIncome | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 38 | 0 | 36.0 | Self Enquiry | 1 | 11.0 | Salaried | Female | 2 | 4.0 | Basic | NaN | Divorced | 1.0 | 1 | 2 | 1 | 0.0 | Executive | 20000.0 |
| 2609 | 0 | 51.0 | Self Enquiry | 1 | 18.0 | Salaried | Female | 3 | 4.0 | King | NaN | Single | 5.0 | 0 | 5 | 1 | 1.0 | VP | 38604.0 |
| 2634 | 0 | 53.0 | Self Enquiry | 1 | 7.0 | Salaried | Male | 4 | 5.0 | King | NaN | Divorced | 2.0 | 0 | 2 | 1 | 2.0 | VP | 38677.0 |
| 3012 | 1 | 56.0 | Self Enquiry | 1 | 9.0 | Small Business | Male | 4 | 4.0 | King | NaN | Divorced | 7.0 | 1 | 2 | 1 | 3.0 | VP | 38537.0 |
| 3190 | 0 | 42.0 | Company Invited | 1 | 14.0 | Salaried | Female | 3 | 6.0 | King | NaN | Married | 3.0 | 0 | 4 | 1 | 1.0 | VP | 38651.0 |
| 3193 | 1 | 53.0 | Self Enquiry | 3 | 9.0 | Small Business | Female | 3 | 6.0 | King | NaN | Divorced | 3.0 | 0 | 3 | 1 | 1.0 | VP | 38523.0 |
| 3214 | 0 | 47.0 | Self Enquiry | 1 | 7.0 | Small Business | Male | 3 | 4.0 | King | NaN | Married | 2.0 | 0 | 5 | 1 | 2.0 | VP | 38305.0 |
| 3295 | 0 | 57.0 | Self Enquiry | 1 | 11.0 | Large Business | Female | 4 | 4.0 | King | NaN | Married | 6.0 | 0 | 4 | 0 | 3.0 | VP | 38621.0 |
| 3342 | 0 | 44.0 | Self Enquiry | 1 | 10.0 | Salaried | Male | 4 | 6.0 | King | NaN | Divorced | 5.0 | 0 | 5 | 1 | 3.0 | VP | 38418.0 |
| 3362 | 0 | 52.0 | Company Invited | 3 | 16.0 | Salaried | Male | 3 | 4.0 | King | NaN | Married | 6.0 | 1 | 4 | 1 | 2.0 | VP | 38525.0 |
| 3400 | 0 | 57.0 | Self Enquiry | 2 | 15.0 | Salaried | Male | 3 | 4.0 | King | NaN | Single | 8.0 | 0 | 4 | 1 | 1.0 | VP | 38395.0 |
| 3453 | 0 | 59.0 | Self Enquiry | 1 | 7.0 | Small Business | Female | 4 | 4.0 | King | NaN | Divorced | 5.0 | 1 | 1 | 1 | 3.0 | VP | 38379.0 |
| 3598 | 0 | 48.0 | Self Enquiry | 2 | 33.0 | Salaried | Female | 4 | 4.0 | King | NaN | Married | 5.0 | 0 | 4 | 1 | 2.0 | VP | 38336.0 |
| 3686 | 0 | 41.0 | Self Enquiry | 3 | 14.0 | Small Business | Male | 3 | 4.0 | King | NaN | Single | 3.0 | 0 | 4 | 1 | 2.0 | VP | 38511.0 |
| 3775 | 0 | 49.0 | Self Enquiry | 1 | 17.0 | Salaried | Male | 4 | 5.0 | King | NaN | Married | 6.0 | 0 | 3 | 1 | 2.0 | VP | 38343.0 |
| 3845 | 0 | 56.0 | Self Enquiry | 2 | 33.0 | Salaried | Male | 4 | 2.0 | King | NaN | Married | 6.0 | 1 | 5 | 1 | 3.0 | VP | 38314.0 |
| 4079 | 0 | 51.0 | Self Enquiry | 1 | 18.0 | Salaried | Female | 3 | 4.0 | King | NaN | Single | 5.0 | 0 | 5 | 0 | 2.0 | VP | 38604.0 |
| 4104 | 0 | 53.0 | Self Enquiry | 1 | 7.0 | Salaried | Male | 4 | 5.0 | King | NaN | Married | 2.0 | 0 | 1 | 1 | 3.0 | VP | 38677.0 |
| 4482 | 1 | 56.0 | Self Enquiry | 1 | 9.0 | Small Business | Male | 4 | 4.0 | King | NaN | Married | 7.0 | 1 | 1 | 1 | 1.0 | VP | 38537.0 |
| 4660 | 0 | 42.0 | Company Invited | 1 | 14.0 | Salaried | Female | 3 | 6.0 | King | NaN | Married | 3.0 | 0 | 4 | 1 | 2.0 | VP | 38651.0 |
| 4663 | 1 | 53.0 | Self Enquiry | 3 | 9.0 | Small Business | Female | 3 | 6.0 | King | NaN | Married | 3.0 | 0 | 3 | 1 | 2.0 | VP | 38523.0 |
| 4684 | 0 | 47.0 | Self Enquiry | 1 | 7.0 | Small Business | Male | 3 | 4.0 | King | NaN | Married | 2.0 | 0 | 5 | 1 | 1.0 | VP | 38305.0 |
| 4765 | 0 | 57.0 | Self Enquiry | 1 | 11.0 | Large Business | Female | 4 | 4.0 | King | NaN | Married | 6.0 | 0 | 4 | 1 | 2.0 | VP | 38621.0 |
| 4812 | 0 | 44.0 | Self Enquiry | 1 | 10.0 | Salaried | Male | 4 | 6.0 | King | NaN | Married | 5.0 | 0 | 5 | 1 | 1.0 | VP | 38418.0 |
| 4832 | 1 | 52.0 | Company Invited | 1 | 35.0 | Salaried | Male | 4 | 5.0 | Deluxe | NaN | Single | 5.0 | 0 | 3 | 0 | 1.0 | Manager | 38525.0 |
| 4870 | 1 | 57.0 | Self Enquiry | 3 | 23.0 | Salaried | Female | 4 | 4.0 | Standard | NaN | Single | 4.0 | 1 | 5 | 1 | 3.0 | Senior Manager | 38395.0 |
# We'll impute these missing values one by one, by taking mean of PreferredPropertyStar for the particular Designation and ProductPitched
df.groupby(["Designation", "ProductPitched"])["PreferredPropertyStar"].mean().round(0)
Designation ProductPitched
AVP Basic NaN
Deluxe NaN
King NaN
Standard NaN
Super Deluxe 4.0
Executive Basic 4.0
Deluxe NaN
King NaN
Standard NaN
Super Deluxe NaN
Manager Basic NaN
Deluxe 4.0
King NaN
Standard NaN
Super Deluxe NaN
Senior Manager Basic NaN
Deluxe NaN
King NaN
Standard 4.0
Super Deluxe NaN
VP Basic NaN
Deluxe NaN
King 3.0
Standard NaN
Super Deluxe NaN
Name: PreferredPropertyStar, dtype: float64
# Impute missing values of PreferredPropertyStar
df["PreferredPropertyStar"] = df.groupby(["Designation", "ProductPitched"])[
"PreferredPropertyStar"
].transform(lambda x: round(x.fillna(x.mean())))
df[df["PreferredPropertyStar"].isnull()]
| ProdTaken | Age | TypeofContact | CityTier | DurationOfPitch | Occupation | Gender | NumberOfPersonVisiting | NumberOfFollowups | ProductPitched | PreferredPropertyStar | MaritalStatus | NumberOfTrips | Passport | PitchSatisfactionScore | OwnCar | NumberOfChildrenVisiting | Designation | MonthlyIncome |
|---|
# Cheeking the rows with TypeofContact missing
df[df["TypeofContact"].isnull()]
| ProdTaken | Age | TypeofContact | CityTier | DurationOfPitch | Occupation | Gender | NumberOfPersonVisiting | NumberOfFollowups | ProductPitched | PreferredPropertyStar | MaritalStatus | NumberOfTrips | Passport | PitchSatisfactionScore | OwnCar | NumberOfChildrenVisiting | Designation | MonthlyIncome | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 224 | 0 | 31.0 | NaN | 1 | 17.0 | Small Business | Male | 2 | 5.0 | Deluxe | 3.0 | Divorced | 1.0 | 0 | 3 | 1 | 0.0 | Manager | 22697.0 |
| 571 | 0 | 26.0 | NaN | 1 | 15.0 | Salaried | Female | 3 | 5.0 | Basic | 3.0 | Married | 4.0 | 0 | 4 | 1 | 2.0 | Executive | 19932.0 |
| 572 | 0 | 29.0 | NaN | 1 | 17.0 | Small Business | Female | 3 | 3.0 | Deluxe | 3.0 | Divorced | 5.0 | 0 | 2 | 1 | 0.0 | Manager | 22639.0 |
| 576 | 0 | 27.0 | NaN | 3 | 17.0 | Small Business | Male | 2 | 3.0 | Deluxe | 3.0 | Divorced | 1.0 | 0 | 3 | 0 | 1.0 | Manager | 22697.0 |
| 579 | 0 | 34.0 | NaN | 1 | 16.0 | Small Business | Female | 2 | 4.0 | Basic | 5.0 | Single | 2.0 | 0 | 2 | 1 | 1.0 | Executive | 19859.0 |
| 598 | 1 | 28.0 | NaN | 1 | 16.0 | Small Business | Male | 2 | 3.0 | Basic | 3.0 | Single | 7.0 | 0 | 3 | 0 | 0.0 | Executive | 19829.0 |
| 622 | 0 | 32.0 | NaN | 3 | 15.0 | Salaried | Male | 3 | 3.0 | Deluxe | 3.0 | Married | 3.0 | 0 | 2 | 0 | 0.0 | Manager | 22889.0 |
| 724 | 0 | 24.0 | NaN | 1 | 17.0 | Small Business | Female | 2 | 4.0 | Deluxe | 3.0 | Married | 2.0 | 0 | 3 | 1 | 1.0 | Manager | 22639.0 |
| 843 | 0 | 26.0 | NaN | 1 | 16.0 | Small Business | Male | 2 | 1.0 | Basic | 3.0 | Divorced | 2.0 | 0 | 5 | 1 | 1.0 | Executive | 19829.0 |
| 1021 | 1 | 25.0 | NaN | 3 | 15.0 | Salaried | Male | 3 | 4.0 | Basic | 5.0 | Divorced | 4.0 | 0 | 1 | 1 | 0.0 | Executive | 19778.0 |
| 1047 | 0 | 33.0 | NaN | 3 | 17.0 | Small Business | Male | 2 | 3.0 | Deluxe | 5.0 | Divorced | 1.0 | 0 | 3 | 0 | 0.0 | Manager | 22697.0 |
| 1143 | 0 | 45.0 | NaN | 3 | 17.0 | Small Business | Male | 2 | 4.0 | Deluxe | 5.0 | Married | 2.0 | 0 | 3 | 0 | 0.0 | Manager | 22697.0 |
| 1182 | 0 | 36.0 | NaN | 1 | 17.0 | Small Business | Female | 2 | 4.0 | Deluxe | 3.0 | Married | 1.0 | 0 | 5 | 1 | 1.0 | Manager | 22639.0 |
| 1217 | 0 | 24.0 | NaN | 1 | 16.0 | Small Business | Male | 3 | 1.0 | Basic | 3.0 | Married | 2.0 | 0 | 1 | 0 | 0.0 | Executive | 19829.0 |
| 1356 | 0 | 41.0 | NaN | 3 | 17.0 | Small Business | Female | 2 | 3.0 | Deluxe | 4.0 | Married | 6.0 | 0 | 3 | 1 | 1.0 | Manager | 22639.0 |
| 1469 | 0 | 34.0 | NaN | 1 | 17.0 | Small Business | Male | 2 | 1.0 | Deluxe | 3.0 | Married | 3.0 | 0 | 3 | 0 | 1.0 | Manager | 22697.0 |
| 1694 | 0 | 31.0 | NaN | 1 | 17.0 | Small Business | Male | 2 | 5.0 | Deluxe | 3.0 | Married | 1.0 | 0 | 3 | 0 | 0.0 | Manager | 22697.0 |
| 2041 | 0 | 26.0 | NaN | 1 | 15.0 | Salaried | Female | 3 | 5.0 | Basic | 3.0 | Married | 4.0 | 0 | 4 | 1 | 0.0 | Executive | 19932.0 |
| 2042 | 0 | 29.0 | NaN | 1 | 17.0 | Small Business | Female | 3 | 3.0 | Deluxe | 3.0 | Married | 5.0 | 0 | 1 | 0 | 1.0 | Manager | 22639.0 |
| 2046 | 0 | 27.0 | NaN | 3 | 17.0 | Small Business | Male | 2 | 3.0 | Deluxe | 3.0 | Married | 1.0 | 0 | 3 | 1 | 1.0 | Manager | 22697.0 |
| 2049 | 0 | 34.0 | NaN | 1 | 16.0 | Small Business | Female | 2 | 4.0 | Basic | 5.0 | Single | 2.0 | 0 | 1 | 1 | 0.0 | Executive | 19859.0 |
| 2068 | 1 | 28.0 | NaN | 1 | 16.0 | Small Business | Male | 2 | 3.0 | Basic | 3.0 | Single | 7.0 | 0 | 3 | 1 | 1.0 | Executive | 19829.0 |
| 2092 | 0 | 32.0 | NaN | 3 | 15.0 | Salaried | Male | 3 | 3.0 | Deluxe | 3.0 | Married | 3.0 | 0 | 1 | 0 | 2.0 | Manager | 22889.0 |
| 2194 | 0 | 24.0 | NaN | 1 | 17.0 | Small Business | Female | 2 | 4.0 | Deluxe | 3.0 | Married | 2.0 | 0 | 3 | 0 | 0.0 | Manager | 22639.0 |
| 2313 | 0 | 26.0 | NaN | 1 | 16.0 | Small Business | Male | 2 | 1.0 | Basic | 3.0 | Married | 2.0 | 0 | 5 | 1 | 1.0 | Executive | 19829.0 |
df.TypeofContact.value_counts() / len(df.TypeofContact)
Self Enquiry 0.704583 Company Invited 0.290303 Name: TypeofContact, dtype: float64
As checked on EDA, TypeOfContact has no pattern with others variables. Considering that 70.5% of customer Self Enquiry and 29.5% Company Invited, we gonna consider that this 25 customer (0.5% of dataset) belongs to Self Enquiry.
# replacing na values in TypeofContact with Self Enquiry
df["TypeofContact"].fillna("Self Enquiry", inplace=True)
df[df["TypeofContact"].isnull()]
| ProdTaken | Age | TypeofContact | CityTier | DurationOfPitch | Occupation | Gender | NumberOfPersonVisiting | NumberOfFollowups | ProductPitched | PreferredPropertyStar | MaritalStatus | NumberOfTrips | Passport | PitchSatisfactionScore | OwnCar | NumberOfChildrenVisiting | Designation | MonthlyIncome |
|---|
# Checking if all missing data were fill it up
df.isnull().sum().sort_values(ascending=False)
MonthlyIncome 0 NumberOfFollowups 0 Age 0 TypeofContact 0 CityTier 0 DurationOfPitch 0 Occupation 0 Gender 0 NumberOfPersonVisiting 0 ProductPitched 0 Designation 0 PreferredPropertyStar 0 MaritalStatus 0 NumberOfTrips 0 Passport 0 PitchSatisfactionScore 0 OwnCar 0 NumberOfChildrenVisiting 0 ProdTaken 0 dtype: int64
# To find the 25th percentile and 75th percentile.
Q1 = df.quantile(0.25)
Q3 = df.quantile(0.75)
# Inter Quantile Range (75th perentile - 25th percentile)
IQR = Q3 - Q1
# Finding lower and upper bounds for all values. All values outside these bounds are outliers
lower = Q1 - 1.5 * IQR
upper = Q3 + 1.5 * IQR
(
(df.select_dtypes(include=["float64", "int64"]) < lower)
| (df.select_dtypes(include=["float64", "int64"]) > upper)
).sum() / len(df) * 100
ProdTaken 18.821604 Age 0.000000 CityTier 0.000000 DurationOfPitch 2.250409 NumberOfPersonVisiting 0.061375 NumberOfFollowups 6.382979 PreferredPropertyStar 0.000000 NumberOfTrips 2.229951 Passport 0.000000 PitchSatisfactionScore 0.000000 OwnCar 0.000000 NumberOfChildrenVisiting 0.000000 MonthlyIncome 7.221768 dtype: float64
We are not going to treat them as there will be outliers and we would want our model to learn the underlying pattern for such customers.
# Data Preparation:
X = df.drop(
["ProdTaken"], axis=1
) # Creating x with all independent variable, removing target variable
X = pd.get_dummies(
X, drop_first=True
) # All category variables we creat dummies and drop first.
y = df["ProdTaken"]
# Partition the data into train and test set:
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.3, random_state=1, stratify=y
) # using stratify because of imbalance distribution of the target classes
print(f"X_Train: {X_train.shape},\nX_test: {X_test.shape}")
X_Train: (3421, 28), X_test: (1467, 28)
# Checking the % distribution of the target classes
y.value_counts(1)
0 0.811784 1 0.188216 Name: ProdTaken, dtype: float64
y_train.value_counts(1)
0 0.811751 1 0.188249 Name: ProdTaken, dtype: float64
y_test.value_counts(1)
0 0.811861 1 0.188139 Name: ProdTaken, dtype: float64
We can see that train and test kept the proportion of the target classes (0 ~ 81% ; 1 ~ 19%)
Let's define function to provide metric scores(accuracy,recall and precision) on train and test set and a function to show confusion matrix so that we do not have use the same code repetitively while evaluating models.
## Function to calculate different metric scores of the model - Accuracy, Recall and Precision
def get_metrics_score(model, flag=True):
"""
model : classifier to predict values of X
"""
# defining an empty list to store train and test results
score_list = []
# Predicting on train and tests
pred_train = model.predict(X_train)
pred_test = model.predict(X_test)
# Accuracy of the model
train_acc = model.score(X_train, y_train)
test_acc = model.score(X_test, y_test)
# Recall of the model
train_recall = metrics.recall_score(y_train, pred_train)
test_recall = metrics.recall_score(y_test, pred_test)
# Precision of the model
train_precision = metrics.precision_score(y_train, pred_train)
test_precision = metrics.precision_score(y_test, pred_test)
# F1-score of the model
train_F1 = metrics.f1_score(y_train, pred_train)
test_F1 = metrics.f1_score(y_test, pred_test)
score_list.extend(
(
train_acc,
test_acc,
train_recall,
test_recall,
train_precision,
test_precision,
train_F1,
test_F1,
)
)
# If the flag is set to True then only the following print statements will be dispayed. The default value is set to True.
if flag == True:
print("Accuracy on training set : {:.3f}".format(model.score(X_train, y_train)))
print("Accuracy on test set : {:.3f}".format(model.score(X_test, y_test)))
print(
"Recall on training set : {:.3f}".format(
metrics.recall_score(y_train, pred_train)
)
)
print(
"Recall on test set : {:.3f}".format(
metrics.recall_score(y_test, pred_test)
)
)
print(
"Precision on training set : {:.3f}".format(
metrics.precision_score(y_train, pred_train)
)
)
print(
"Precision on test set : {:.3f}".format(
metrics.precision_score(y_test, pred_test)
)
)
print(
"F1_score on training set : {:.3f}".format(
metrics.f1_score(y_train, pred_train)
)
)
print(
"F1_score on test set : {:.3f}".format(metrics.f1_score(y_test, pred_test))
)
return score_list # returning the list with train and test scores
def confusion_matrix_sklearn(model, predictors, target):
"""
To plot the confusion_matrix with percentages
model: classifier
predictors: independent variables
target: dependent variable
"""
y_pred = model.predict(predictors)
cm = confusion_matrix(target, y_pred)
labels = np.asarray(
[
["{0:0.0f}".format(item) + "\n{0:.2%}".format(item / cm.flatten().sum())]
for item in cm.flatten()
]
).reshape(2, 2)
plt.figure(figsize=(6, 4))
sns.heatmap(cm, annot=labels, fmt="")
plt.ylabel("True label")
plt.xlabel("Predicted label")
We will build our model using the DecisionTreeClassifier function. Using default 'gini' criteria to split.
Considering that our data is imbalanced, class 0 will become the dominant class and the decision tree will become biased toward the dominant classes. So we gonna pass a dictionary {0:0.19,1:0.81} to the model to specify the weight of each class and the decision tree will give more weightage to class 1.
class_weight is a hyperparameter for the decision tree classifier.
dtree = DecisionTreeClassifier(
criterion="gini", class_weight={0: 0.19, 1: 0.81}, random_state=1
)
dtree.fit(X_train, y_train)
DecisionTreeClassifier(class_weight={0: 0.19, 1: 0.81}, random_state=1)
confusion_matrix_sklearn(dtree, X_test, y_test)
dtree_model_perf = get_metrics_score(dtree, True)
Accuracy on training set : 1.000 Accuracy on test set : 0.885 Recall on training set : 1.000 Recall on test set : 0.714 Precision on training set : 1.000 Precision on test set : 0.689 F1_score on training set : 1.000 F1_score on test set : 0.701
training data, but not on test data. We can see that our Decision Tree is overfitting.# Building our model with default bagging classifier
bagging = BaggingClassifier(random_state=1)
# Fitting our model on our training dataset
bagging.fit(X_train, y_train)
BaggingClassifier(random_state=1)
confusion_matrix_sklearn(bagging, X_test, y_test)
bagging_model_perf = get_metrics_score(bagging)
Accuracy on training set : 0.994 Accuracy on test set : 0.905 Recall on training set : 0.969 Recall on test set : 0.587 Precision on training set : 0.998 Precision on test set : 0.866 F1_score on training set : 0.983 F1_score on test set : 0.700
training data and, is performing poorly on the test data mainly in the recall metric.bagging_wt = BaggingClassifier(
base_estimator=DecisionTreeClassifier(
criterion="gini", class_weight={0: 0.19, 1: 0.81}, random_state=1
),
random_state=1,
)
bagging_wt.fit(X_train, y_train)
BaggingClassifier(base_estimator=DecisionTreeClassifier(class_weight={0: 0.19,
1: 0.81},
random_state=1),
random_state=1)
confusion_matrix_sklearn(bagging_wt, X_test, y_test)
bagging_wt_model_perf = get_metrics_score(bagging_wt)
Accuracy on training set : 0.994 Accuracy on test set : 0.898 Recall on training set : 0.967 Recall on test set : 0.547 Precision on training set : 0.998 Precision on test set : 0.858 F1_score on training set : 0.983 F1_score on test set : 0.668
training data and, is performing poorly on the test data mainly in the recall metric.rf = RandomForestClassifier(random_state=1)
rf.fit(X_train, y_train)
RandomForestClassifier(random_state=1)
confusion_matrix_sklearn(rf, X_test, y_test)
rf_model_perf = get_metrics_score(rf)
Accuracy on training set : 1.000 Accuracy on test set : 0.915 Recall on training set : 1.000 Recall on test set : 0.583 Precision on training set : 1.000 Precision on test set : 0.942 F1_score on training set : 1.000 F1_score on test set : 0.720
accuracy and precision, but it is not able to generalize well on the test data in terms of recall.rf_wt = RandomForestClassifier(class_weight={0: 0.19, 1: 0.81}, random_state=1)
rf_wt.fit(X_train, y_train)
RandomForestClassifier(class_weight={0: 0.19, 1: 0.81}, random_state=1)
confusion_matrix_sklearn(rf_wt, X_test, y_test)
rf_wt_model_perf = get_metrics_score(rf_wt)
Accuracy on training set : 1.000 Accuracy on test set : 0.911 Recall on training set : 1.000 Recall on test set : 0.547 Precision on training set : 1.000 Precision on test set : 0.962 F1_score on training set : 1.000 F1_score on test set : 0.697
Precision compared with Random Forest default, hence there is not much improvement in metrics of weighted random forest as compared to the unweighted random forest.Tuning Decision Tree
# Choose the type of classifier.
dtree_estimator = DecisionTreeClassifier(
class_weight={0: 0.19, 1: 0.81}, random_state=1
)
# Grid of parameters to choose from
parameters = {
"max_depth": np.arange(2, 30),
"min_samples_leaf": [1, 2, 5, 7, 10],
"max_leaf_nodes": [2, 3, 5, 10, 15],
"min_impurity_decrease": [0.0001, 0.001, 0.01, 0.1],
}
# Type of scoring used to compare parameter combinations
scorer = metrics.make_scorer(metrics.recall_score)
# Run the grid search
grid_obj = GridSearchCV(dtree_estimator, parameters, scoring=scorer)
grid_obj = grid_obj.fit(X_train, y_train)
# Set the clf to the best combination of parameters
dtree_estimator = grid_obj.best_estimator_
# Fit the best algorithm to the data.
dtree_estimator.fit(X_train, y_train)
DecisionTreeClassifier(class_weight={0: 0.19, 1: 0.81}, max_depth=7,
max_leaf_nodes=15, min_impurity_decrease=0.0001,
min_samples_leaf=10, random_state=1)
confusion_matrix_sklearn(dtree_estimator, X_test, y_test)
dtree_estimator_model_perf = get_metrics_score(dtree_estimator)
Accuracy on training set : 0.796 Accuracy on test set : 0.803 Recall on training set : 0.648 Recall on test set : 0.663 Precision on training set : 0.469 Precision on test set : 0.483 F1_score on training set : 0.544 F1_score on test set : 0.559
Tuning Bagging Classifier
# grid search for bagging classifier
cl1 = DecisionTreeClassifier(class_weight={0: 0.19, 1: 0.81}, random_state=1)
param_grid = {
"base_estimator": [cl1],
"n_estimators": [5, 7, 15, 51, 101],
"max_features": [0.7, 0.8, 0.9, 1],
}
grid = GridSearchCV(
BaggingClassifier(random_state=1, bootstrap=True),
param_grid=param_grid,
scoring="recall",
cv=5,
)
grid.fit(X_train, y_train)
GridSearchCV(cv=5, estimator=BaggingClassifier(random_state=1),
param_grid={'base_estimator': [DecisionTreeClassifier(class_weight={0: 0.19,
1: 0.81},
random_state=1)],
'max_features': [0.7, 0.8, 0.9, 1],
'n_estimators': [5, 7, 15, 51, 101]},
scoring='recall')
## getting the best estimator
bagging_estimator = grid.best_estimator_
bagging_estimator.fit(X_train, y_train)
BaggingClassifier(base_estimator=DecisionTreeClassifier(class_weight={0: 0.19,
1: 0.81},
random_state=1),
max_features=1, n_estimators=15, random_state=1)
confusion_matrix_sklearn(bagging_estimator, X_test, y_test)
bagging_estimator_model_perf = get_metrics_score(bagging_estimator)
Accuracy on training set : 0.716 Accuracy on test set : 0.714 Recall on training set : 0.666 Recall on test set : 0.703 Precision on training set : 0.362 Precision on test set : 0.365 F1_score on training set : 0.469 F1_score on test set : 0.480
Tuning Random Forest
# Choose the type of classifier.
rf_estimator = RandomForestClassifier(random_state=1)
# Grid of parameters to choose from
parameters = {
"class_weight": [{0: 0.19, 1: 0.81}],
"n_estimators": [100, 150, 200, 250],
"min_samples_leaf": np.arange(5, 10),
"max_features": np.arange(0.2, 0.7, 0.1),
"max_samples": np.arange(0.3, 0.7, 0.1),
}
# Run the grid search
grid_obj = GridSearchCV(rf_estimator, parameters, scoring="recall", cv=5)
grid_obj = grid_obj.fit(X_train, y_train)
# Set the clf to the best combination of parameters
rf_estimator = grid_obj.best_estimator_
# Fit the best algorithm to the data.
rf_estimator.fit(X_train, y_train)
RandomForestClassifier(class_weight={0: 0.19, 1: 0.81}, max_features=0.2,
max_samples=0.6000000000000001, min_samples_leaf=8,
random_state=1)
confusion_matrix_sklearn(rf_estimator, X_test, y_test)
rf_estimator_model_perf = get_metrics_score(rf_estimator)
Accuracy on training set : 0.903 Accuracy on test set : 0.869 Recall on training set : 0.848 Recall on test set : 0.674 Precision on training set : 0.699 Precision on test set : 0.646 F1_score on training set : 0.766 F1_score on test set : 0.660
# defining list of models
models = [
dtree,
dtree_estimator,
bagging,
bagging_wt,
bagging_estimator,
rf,
rf_wt,
rf_estimator,
]
# defining empty lists to add train and test results
acc_train = []
acc_test = []
recall_train = []
recall_test = []
precision_train = []
precision_test = []
F1_score_train = []
F1_score_test = []
# looping through all the models to get the accuracy, recall and precision scores
for model in models:
j = get_metrics_score(model, False)
acc_train.append(np.round(j[0], 3))
acc_test.append(np.round(j[1], 3))
recall_train.append(np.round(j[2], 3))
recall_test.append(np.round(j[3], 3))
precision_train.append(np.round(j[4], 3))
precision_test.append(np.round(j[5], 3))
F1_score_train.append(np.round(j[6], 3))
F1_score_test.append(np.round(j[7], 3))
comparison_frame = pd.DataFrame(
{
"Model": [
"Decision Tree - default parameters",
"Decision Tree - Tunned",
"Bagging classifier - default parameters",
"Bagging Classifier - weighted",
"Bagging Classifier - Tunned",
"Random Forest - deafult parameters",
"Random Forest - weighted",
"Random Forest - Tunned",
],
"Train_Accuracy": acc_train,
"Test_Accuracy": acc_test,
"Train_Recall": recall_train,
"Test_Recall": recall_test,
"Train_Precision": precision_train,
"Test_Precision": precision_test,
"Train_F1_Score": F1_score_train,
"Test_F1_Score": F1_score_test,
}
)
comparison_frame
| Model | Train_Accuracy | Test_Accuracy | Train_Recall | Test_Recall | Train_Precision | Test_Precision | Train_F1_Score | Test_F1_Score | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | Decision Tree - default parameters | 1.000 | 0.885 | 1.000 | 0.714 | 1.000 | 0.689 | 1.000 | 0.701 |
| 1 | Decision Tree - Tunned | 0.796 | 0.803 | 0.648 | 0.663 | 0.469 | 0.483 | 0.544 | 0.559 |
| 2 | Bagging classifier - default parameters | 0.994 | 0.905 | 0.969 | 0.587 | 0.998 | 0.866 | 0.983 | 0.700 |
| 3 | Bagging Classifier - weighted | 0.994 | 0.898 | 0.967 | 0.547 | 0.998 | 0.858 | 0.983 | 0.668 |
| 4 | Bagging Classifier - Tunned | 0.716 | 0.714 | 0.666 | 0.703 | 0.362 | 0.365 | 0.469 | 0.480 |
| 5 | Random Forest - deafult parameters | 1.000 | 0.915 | 1.000 | 0.583 | 1.000 | 0.942 | 1.000 | 0.720 |
| 6 | Random Forest - weighted | 1.000 | 0.911 | 1.000 | 0.547 | 1.000 | 0.962 | 1.000 | 0.697 |
| 7 | Random Forest - Tunned | 0.903 | 0.869 | 0.848 | 0.674 | 0.699 | 0.646 | 0.766 | 0.660 |
All model are performing poorly and/or overfitting.
Random Forest Tunned reduced the overfitting and is performing better compare to the others models. Performance is balanced between metrics (recall, precision and F1_score). It still need improvement considering that is performing 0.674 on test recall.
# importance of features in the tree building on Random Forest Tunned
print(
pd.DataFrame(
rf_estimator.feature_importances_, columns=["Imp"], index=X_train.columns
).sort_values(by="Imp", ascending=False)
)
Imp Passport 0.125679 MonthlyIncome 0.117081 Age 0.113792 DurationOfPitch 0.089557 Designation_Executive 0.057541 NumberOfTrips 0.049538 CityTier 0.048926 PitchSatisfactionScore 0.047908 NumberOfFollowups 0.041924 PreferredPropertyStar 0.039876 MaritalStatus_Single 0.033724 MaritalStatus_Married 0.029896 Designation_Manager 0.022138 Gender_Male 0.020064 NumberOfChildrenVisiting 0.017322 OwnCar 0.016488 Occupation_Salaried 0.015689 TypeofContact_Self Enquiry 0.015494 NumberOfPersonVisiting 0.015404 MaritalStatus_Unmarried 0.014788 Occupation_Small Business 0.014505 ProductPitched_Deluxe 0.012871 ProductPitched_Super Deluxe 0.011050 Occupation_Large Business 0.009240 Designation_Senior Manager 0.007464 ProductPitched_Standard 0.007243 ProductPitched_King 0.002440 Designation_VP 0.002359
feature_names = X_train.columns
importances = rf_estimator.feature_importances_
indices = np.argsort(importances)
plt.figure(figsize=(12, 12))
plt.title("Feature Importances")
plt.barh(range(len(indices)), importances[indices], color="violet", align="center")
plt.yticks(range(len(indices)), [feature_names[i] for i in indices])
plt.xlabel("Relative Importance")
plt.show()
abc = AdaBoostClassifier(random_state=1)
abc.fit(X_train, y_train)
AdaBoostClassifier(random_state=1)
confusion_matrix_sklearn(abc, X_test, y_test)
# Using above defined function to get accuracy, recall and precision on train and test set
abc_score = get_metrics_score(abc)
Accuracy on training set : 0.849 Accuracy on test set : 0.847 Recall on training set : 0.331 Recall on test set : 0.322 Precision on training set : 0.712 Precision on test set : 0.701 F1_score on training set : 0.452 F1_score on test set : 0.442
gbc = GradientBoostingClassifier(random_state=1)
gbc.fit(X_train, y_train)
GradientBoostingClassifier(random_state=1)
confusion_matrix_sklearn(gbc, X_test, y_test)
# Using above defined function to get accuracy, recall and precision on train and test set
gbc_score = get_metrics_score(gbc)
Accuracy on training set : 0.888 Accuracy on test set : 0.868 Recall on training set : 0.449 Recall on test set : 0.395 Precision on training set : 0.909 Precision on test set : 0.807 F1_score on training set : 0.601 F1_score on test set : 0.530
xgb = XGBClassifier(random_state=1, eval_metric="error")
xgb.fit(X_train, y_train)
XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, eval_metric='error',
gamma=0, gpu_id=-1, importance_type='gain',
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, random_state=1, reg_alpha=0, reg_lambda=1,
scale_pos_weight=1, subsample=1, tree_method='exact',
validate_parameters=1, verbosity=None)
confusion_matrix_sklearn(xgb, X_test, y_test)
# Using above defined function to get accuracy, recall and precision on train and test set
xgb_score = get_metrics_score(xgb)
Accuracy on training set : 1.000 Accuracy on test set : 0.926 Recall on training set : 0.998 Recall on test set : 0.692 Precision on training set : 1.000 Precision on test set : 0.888 F1_score on training set : 0.999 F1_score on test set : 0.778
Considering that XBBoost treats missing values, we gonna check the performance of the model with the data
beforemissing values treatment
# Data (before missing values treatment) Preparation:
X_xg = df_XG.drop(
["ProdTaken"], axis=1
) # Creating x with all independent variable, removing target variable
X_xg = pd.get_dummies(
X_xg, drop_first=True
) # All category variables we creat dummies and drop first.
y_xg = df_XG["ProdTaken"]
# Partition the data into train and test set:
X_xg_train, X_xg_test, y_xg_train, y_xg_test = train_test_split(
X_xg, y_xg, test_size=0.3, random_state=1, stratify=y_xg
) # using stratify because of imbalance distribution of the target classes
print(f"X_Train: {X_xg_train.shape},\nX_test: {X_test.shape}")
X_Train: (3421, 28), X_test: (1467, 28)
# Checking the % distribution of the target classes
y_xg.value_counts(1)
0 0.811784 1 0.188216 Name: ProdTaken, dtype: float64
y_xg_train.value_counts(1)
0 0.811751 1 0.188249 Name: ProdTaken, dtype: float64
y_xg_test.value_counts(1)
0 0.811861 1 0.188139 Name: ProdTaken, dtype: float64
## Function to calculate different metric scores of the model - Accuracy, Recall and Precision for XGBOOSTING BEFORE missing values treatment
def get_metrics_score_2(model, flag=True):
"""
model : classifier to predict values of X
"""
# defining an empty list to store train and test results
score_list = []
# Predicting on train and tests
pred_train = model.predict(X_xg_train)
pred_test = model.predict(X_xg_test)
# Accuracy of the model
train_acc = model.score(X_xg_train, y_xg_train)
test_acc = model.score(X_xg_test, y_xg_test)
# Recall of the model
train_recall = metrics.recall_score(y_xg_train, pred_train)
test_recall = metrics.recall_score(y_xg_test, pred_test)
# Precision of the model
train_precision = metrics.precision_score(y_xg_train, pred_train)
test_precision = metrics.precision_score(y_xg_test, pred_test)
# F1-score of the model
train_F1 = metrics.f1_score(y_xg_train, pred_train)
test_F1 = metrics.f1_score(y_xg_test, pred_test)
score_list.extend(
(
train_acc,
test_acc,
train_recall,
test_recall,
train_precision,
test_precision,
train_F1,
test_F1,
)
)
# If the flag is set to True then only the following print statements will be dispayed. The default value is set to True.
if flag == True:
print(
"Accuracy on training set : {:.3f}".format(
model.score(X_xg_train, y_xg_train)
)
)
print("Accuracy on test set : {:.3f}".format(model.score(X_xg_test, y_xg_test)))
print(
"Recall on training set : {:.3f}".format(
metrics.recall_score(y_xg_train, pred_train)
)
)
print(
"Recall on test set : {:.3f}".format(
metrics.recall_score(y_xg_test, pred_test)
)
)
print(
"Precision on training set : {:.3f}".format(
metrics.precision_score(y_xg_train, pred_train)
)
)
print(
"Precision on test set : {:.3f}".format(
metrics.precision_score(y_xg_test, pred_test)
)
)
print(
"F1_score on training set : {:.3f}".format(
metrics.f1_score(y_xg_train, pred_train)
)
)
print(
"F1_score on test set : {:.3f}".format(
metrics.f1_score(y_xg_test, pred_test)
)
)
return score_list # returning the list with train and test scores
xgb_2 = XGBClassifier(random_state=1, eval_metric="error")
xgb_2.fit(X_xg_train, y_xg_train)
XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, eval_metric='error',
gamma=0, gpu_id=-1, importance_type='gain',
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=12,
num_parallel_tree=1, random_state=1, reg_alpha=0, reg_lambda=1,
scale_pos_weight=1, subsample=1, tree_method='exact',
validate_parameters=1, verbosity=None)
confusion_matrix_sklearn(xgb_2, X_xg_test, y_xg_test)
# Using above defined function to get accuracy, recall and precision on train and test set
xgb_2_score = get_metrics_score_2(xgb_2)
Accuracy on training set : 1.000 Accuracy on test set : 0.928 Recall on training set : 0.998 Recall on test set : 0.710 Precision on training set : 1.000 Precision on test set : 0.883 F1_score on training set : 0.999 F1_score on test set : 0.787
without missing values treatment does show an overfitting on recall and F1_score but is performing better than the others models. without missing values treatment is performing a litter bit better than XGBoost Default. Recall on test 0.710 vs 0.692 respectivally.# Choose the type of classifier.
abc_tuned = AdaBoostClassifier(random_state=1)
# Grid of parameters to choose from
## add from article
parameters = {
# Let's try different max_depth for base_estimator
"base_estimator": [
DecisionTreeClassifier(max_depth=1),
DecisionTreeClassifier(max_depth=2),
DecisionTreeClassifier(max_depth=3),
],
"n_estimators": np.arange(10, 110, 10),
"learning_rate": np.arange(0.1, 2, 0.1),
}
# Type of scoring used to compare parameter combinations
acc_scorer = metrics.make_scorer(metrics.recall_score)
# Run the grid search
grid_obj = GridSearchCV(abc_tuned, parameters, scoring=acc_scorer, cv=5)
grid_obj = grid_obj.fit(X_train, y_train)
# Set the clf to the best combination of parameters
abc_tuned = grid_obj.best_estimator_
# Fit the best algorithm to the data.
abc_tuned.fit(X_train, y_train)
AdaBoostClassifier(base_estimator=DecisionTreeClassifier(max_depth=3),
learning_rate=1.6, n_estimators=100, random_state=1)
confusion_matrix_sklearn(abc_tuned, X_test, y_test)
# Using above defined function to get accuracy, recall and precision on train and test set
abc_tuned_score = get_metrics_score(abc_tuned)
Accuracy on training set : 0.990 Accuracy on test set : 0.864 Recall on training set : 0.964 Recall on test set : 0.576 Precision on training set : 0.981 Precision on test set : 0.660 F1_score on training set : 0.973 F1_score on test set : 0.615
without missing values treatment.using AdaBoost classifier as the estimator for initial predictions
gbc_init = GradientBoostingClassifier(
init=AdaBoostClassifier(random_state=1), random_state=1
)
gbc_init.fit(X_train, y_train)
GradientBoostingClassifier(init=AdaBoostClassifier(random_state=1),
random_state=1)
confusion_matrix_sklearn(gbc_init, X_test, y_test)
# Using above defined function to get accuracy, recall and precision on train and test set
gbc_init_score = get_metrics_score(gbc_init)
Accuracy on training set : 0.886 Accuracy on test set : 0.866 Recall on training set : 0.452 Recall on test set : 0.384 Precision on training set : 0.890 Precision on test set : 0.797 F1_score on training set : 0.599 F1_score on test set : 0.518
As compared to the model with default parameters:
# Choose the type of classifier.
gbc_tuned = GradientBoostingClassifier(
init=AdaBoostClassifier(random_state=1), random_state=1
)
# Grid of parameters to choose from
## add from article
parameters = {
"n_estimators": [100, 150, 200, 250],
"subsample": [0.8, 0.9, 1],
"max_features": [0.7, 0.8, 0.9, 1],
}
# Type of scoring used to compare parameter combinations
acc_scorer = metrics.make_scorer(metrics.recall_score)
# Run the grid search
grid_obj = GridSearchCV(gbc_tuned, parameters, scoring=acc_scorer, cv=5)
grid_obj = grid_obj.fit(X_train, y_train)
# Set the clf to the best combination of parameters
gbc_tuned = grid_obj.best_estimator_
# Fit the best algorithm to the data.
gbc_tuned.fit(X_train, y_train)
GradientBoostingClassifier(init=AdaBoostClassifier(random_state=1),
max_features=0.9, n_estimators=250, random_state=1,
subsample=0.8)
confusion_matrix_sklearn(gbc_tuned, X_test, y_test)
# Using above defined function to get accuracy, recall and precision on train and test set
gbc_tuned_score = get_metrics_score(gbc_tuned)
Accuracy on training set : 0.923 Accuracy on test set : 0.880 Recall on training set : 0.623 Recall on test set : 0.482 Precision on training set : 0.948 Precision on test set : 0.801 F1_score on training set : 0.752 F1_score on test set : 0.602
# Choose the type of classifier.
xgb_tuned = XGBClassifier(eval_metric="error", random_state=1)
# Grid of parameters to choose from
## add from
parameters = {
"n_estimators": np.arange(10, 100, 20),
"subsample": [0.5, 0.7, 0.9, 1],
"learning_rate": [0.01, 0.1, 0.2, 0.05],
"gamma": [0, 1, 3],
"colsample_bytree": [0.5, 0.7, 0.9, 1],
"colsample_bylevel": [0.5, 0.7, 0.9, 1],
}
# Type of scoring used to compare parameter combinations
acc_scorer = metrics.make_scorer(metrics.recall_score)
# Run the grid search
grid_obj = GridSearchCV(xgb_tuned, parameters, scoring=acc_scorer, cv=5)
grid_obj = grid_obj.fit(X_train, y_train)
# Set the clf to the best combination of parameters
xgb_tuned = grid_obj.best_estimator_
# Fit the best algorithm to the data.
xgb_tuned.fit(X_train, y_train)
XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=0.9, eval_metric='error',
gamma=0, gpu_id=-1, importance_type='gain',
interaction_constraints='', learning_rate=0.2, max_delta_step=0,
max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=90, n_jobs=12,
num_parallel_tree=1, random_state=1, reg_alpha=0, reg_lambda=1,
scale_pos_weight=1, subsample=1, tree_method='exact',
validate_parameters=1, verbosity=None)
confusion_matrix_sklearn(xgb_tuned, X_test, y_test)
# Using above defined function to get accuracy, recall and precision on train and test set
xgb_tuned_score = get_metrics_score(xgb_tuned)
Accuracy on training set : 0.994 Accuracy on test set : 0.918 Recall on training set : 0.966 Recall on test set : 0.638 Precision on training set : 1.000 Precision on test set : 0.898 F1_score on training set : 0.983 F1_score on test set : 0.746
# Choose the type of classifier.
xgb_tuned2 = XGBClassifier(eval_metric="error", random_state=1)
# Grid of parameters to choose from
## add from
parameters = {
"n_estimators": [10, 30, 50],
"scale_pos_weight": [1, 2, 5],
"subsample": [0.7, 0.9, 1],
"learning_rate": [0.05, 0.1, 0.2],
"colsample_bytree": [0.7, 0.9, 1],
"colsample_bylevel": [0.5, 0.7, 1],
}
# Type of scoring used to compare parameter combinations
acc_scorer = metrics.make_scorer(metrics.recall_score)
# Run the grid search
grid_obj = GridSearchCV(xgb_tuned2, parameters, scoring=acc_scorer, cv=5)
grid_obj = grid_obj.fit(X_train, y_train)
# Set the clf to the best combination of parameters
xgb_tuned2 = grid_obj.best_estimator_
# Fit the best algorithm to the data.
xgb_tuned2.fit(X_train, y_train)
XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=0.7, eval_metric='error',
gamma=0, gpu_id=-1, importance_type='gain',
interaction_constraints='', learning_rate=0.1, max_delta_step=0,
max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=30, n_jobs=12,
num_parallel_tree=1, random_state=1, reg_alpha=0, reg_lambda=1,
scale_pos_weight=5, subsample=1, tree_method='exact',
validate_parameters=1, verbosity=None)
confusion_matrix_sklearn(xgb_tuned2, X_test, y_test)
# Using above defined function to get accuracy, recall and precision on train and test set
xgb_tuned2_score = get_metrics_score(xgb_tuned2)
Accuracy on training set : 0.906 Accuracy on test set : 0.863 Recall on training set : 0.935 Recall on test set : 0.815 Precision on training set : 0.684 Precision on test set : 0.600 F1_score on training set : 0.790 F1_score on test set : 0.691
importances = xgb_tuned2.feature_importances_
indices = np.argsort(importances)
feature_names = list(X.columns)
plt.figure(figsize=(12, 12))
plt.title("Feature Importances")
plt.barh(range(len(indices)), importances[indices], color="violet", align="center")
plt.yticks(range(len(indices)), [feature_names[i] for i in indices])
plt.xlabel("Relative Importance")
plt.show()
# Choose the type of classifier.
xgb2_tuned = XGBClassifier(eval_metric="error", random_state=1)
# Grid of parameters to choose from
## add from
parameters = {
"n_estimators": np.arange(10, 100, 20),
"scale_pos_weight": [0, 1, 2, 5],
"subsample": [0.5, 0.7, 0.9, 1],
"learning_rate": [0.01, 0.1, 0.2, 0.05],
"gamma": [0, 1, 3],
"colsample_bytree": [0.5, 0.7, 0.9, 1],
"colsample_bylevel": [0.5, 0.7, 0.9, 1],
}
# Type of scoring used to compare parameter combinations
acc_scorer = metrics.make_scorer(metrics.recall_score)
# Run the grid search
grid_obj = GridSearchCV(xgb2_tuned, parameters, scoring=acc_scorer, cv=5)
grid_obj = grid_obj.fit(X_xg_train, y_xg_train)
# Set the clf to the best combination of parameters
xgb2_tuned = grid_obj.best_estimator_
# Fit the best algorithm to the data.
xgb2_tuned.fit(X_xg_train, y_xg_train)
XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=0.9,
colsample_bynode=1, colsample_bytree=0.9, eval_metric='error',
gamma=3, gpu_id=-1, importance_type='gain',
interaction_constraints='', learning_rate=0.2, max_delta_step=0,
max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=30, n_jobs=12,
num_parallel_tree=1, random_state=1, reg_alpha=0, reg_lambda=1,
scale_pos_weight=5, subsample=1, tree_method='exact',
validate_parameters=1, verbosity=None)
confusion_matrix_sklearn(xgb2_tuned, X_xg_test, y_xg_test)
# Using above defined function to get accuracy, recall and precision on train and test set
xgb2_tuned_score = get_metrics_score_2(xgb2_tuned)
Accuracy on training set : 0.946 Accuracy on test set : 0.875 Recall on training set : 0.980 Recall on test set : 0.804 Precision on training set : 0.785 Precision on test set : 0.632 F1_score on training set : 0.872 F1_score on test set : 0.708
# Choose the type of classifier.
xgb2_tuned2 = XGBClassifier(eval_metric="error", random_state=1)
# Grid of parameters to choose from
## add from
parameters = {
"n_estimators": [10, 30, 50],
"scale_pos_weight": [1, 2, 5],
"subsample": [0.7, 0.9, 1],
"learning_rate": [0.05, 0.1, 0.2],
"colsample_bytree": [0.7, 0.9, 1],
"colsample_bylevel": [0.5, 0.7, 1],
}
# Type of scoring used to compare parameter combinations
acc_scorer = metrics.make_scorer(metrics.recall_score)
# Run the grid search
grid_obj = GridSearchCV(xgb2_tuned2, parameters, scoring=acc_scorer, cv=5)
grid_obj = grid_obj.fit(X_xg_train, y_xg_train)
# Set the clf to the best combination of parameters
xgb2_tuned2 = grid_obj.best_estimator_
# Fit the best algorithm to the data.
xgb2_tuned2.fit(X_xg_train, y_xg_train)
XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=0.7, eval_metric='error',
gamma=0, gpu_id=-1, importance_type='gain',
interaction_constraints='', learning_rate=0.1, max_delta_step=0,
max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=30, n_jobs=12,
num_parallel_tree=1, random_state=1, reg_alpha=0, reg_lambda=1,
scale_pos_weight=5, subsample=1, tree_method='exact',
validate_parameters=1, verbosity=None)
confusion_matrix_sklearn(xgb2_tuned2, X_xg_test, y_xg_test)
xgb2_tuned2_score = get_metrics_score_2(xgb2_tuned2)
Accuracy on training set : 0.909 Accuracy on test set : 0.862 Recall on training set : 0.929 Recall on test set : 0.826 Precision on training set : 0.694 Precision on test set : 0.595 F1_score on training set : 0.794 F1_score on test set : 0.692
importances = xgb2_tuned2.feature_importances_
indices = np.argsort(importances)
feature_names = list(X_xg.columns)
plt.figure(figsize=(12, 12))
plt.title("Feature Importances")
plt.barh(range(len(indices)), importances[indices], color="violet", align="center")
plt.yticks(range(len(indices)), [feature_names[i] for i in indices])
plt.xlabel("Relative Importance")
plt.show()
estimators = [
("Random Forest", rf_estimator),
("Gradient Boosting", gbc_tuned),
("Decision Tree", dtree_estimator),
]
final_estimator = xgb_tuned
stacking_classifier = StackingClassifier(
estimators=estimators, final_estimator=final_estimator
)
stacking_classifier.fit(X_train, y_train)
StackingClassifier(estimators=[('Random Forest',
RandomForestClassifier(class_weight={0: 0.19,
1: 0.81},
max_features=0.2,
max_samples=0.6000000000000001,
min_samples_leaf=8,
random_state=1)),
('Gradient Boosting',
GradientBoostingClassifier(init=AdaBoostClassifier(random_state=1),
max_features=0.9,
n_estimators=250,
random_state=1,
subsample=0.8)),
('Decision Tree'...
eval_metric='error', gamma=0,
gpu_id=-1,
importance_type='gain',
interaction_constraints='',
learning_rate=0.2,
max_delta_step=0, max_depth=6,
min_child_weight=1,
missing=nan,
monotone_constraints='()',
n_estimators=90, n_jobs=12,
num_parallel_tree=1,
random_state=1, reg_alpha=0,
reg_lambda=1,
scale_pos_weight=1,
subsample=1,
tree_method='exact',
validate_parameters=1,
verbosity=None))
confusion_matrix_sklearn(stacking_classifier, X_test, y_test)
# Using above defined function to get accuracy, recall and precision on train and test set
stacking_classifier_score = get_metrics_score(stacking_classifier)
Accuracy on training set : 0.918 Accuracy on test set : 0.882 Recall on training set : 0.630 Recall on test set : 0.489 Precision on training set : 0.908 Precision on test set : 0.808 F1_score on training set : 0.744 F1_score on test set : 0.609
# defining list of models X_train
models = [
abc,
abc_tuned,
gbc,
gbc_init,
gbc_tuned,
xgb,
xgb_tuned,
xgb_tuned2,
]
# defining list of models X_xg_train
models_2 = [
xgb_2,
xgb2_tuned,
xgb2_tuned2,
]
# defining list of models X_train
models_3 = [
stacking_classifier,
]
# defining empty lists to add train and test results
acc_train = []
acc_test = []
recall_train = []
recall_test = []
precision_train = []
precision_test = []
F1_score_train = []
F1_score_test = []
# looping through all the models to get the accuracy, recall and precision scores
for model in models:
j = get_metrics_score(model, False)
acc_train.append(np.round(j[0], 3))
acc_test.append(np.round(j[1], 3))
recall_train.append(np.round(j[2], 3))
recall_test.append(np.round(j[3], 3))
precision_train.append(np.round(j[4], 3))
precision_test.append(np.round(j[5], 3))
F1_score_train.append(np.round(j[6], 3))
F1_score_test.append(np.round(j[7], 3))
# looping through all the models to get the accuracy, recall and precision scores
for model in models_2:
j = get_metrics_score_2(model, False)
acc_train.append(np.round(j[0], 3))
acc_test.append(np.round(j[1], 3))
recall_train.append(np.round(j[2], 3))
recall_test.append(np.round(j[3], 3))
precision_train.append(np.round(j[4], 3))
precision_test.append(np.round(j[5], 3))
F1_score_train.append(np.round(j[6], 3))
F1_score_test.append(np.round(j[7], 3))
# looping through all the models to get the accuracy, recall and precision scores
for model in models_3:
j = get_metrics_score(model, False)
acc_train.append(np.round(j[0], 3))
acc_test.append(np.round(j[1], 3))
recall_train.append(np.round(j[2], 3))
recall_test.append(np.round(j[3], 3))
precision_train.append(np.round(j[4], 3))
precision_test.append(np.round(j[5], 3))
F1_score_train.append(np.round(j[6], 3))
F1_score_test.append(np.round(j[7], 3))
comparison_frame2 = pd.DataFrame(
{
"Model": [
"AdaBoost - default parameters",
"AdaBoost - Tunned",
"Gradient Boosting - default parameters",
"Gradient Boosting - init",
"Gradient Boosting - Tunned",
"XGB - deafult parameters",
"XGB - Tunned",
"XGB2 - Tunned other parameters",
"XGB_2 - deafult (No treatment Missing values)",
"XGB_2 - Tunned (No treatment Missing values)",
"XGB2_2 - Tunned other parameters (No treatment Missing values)",
"stacking_classifier",
],
"Train_Accuracy": acc_train,
"Test_Accuracy": acc_test,
"Train_Recall": recall_train,
"Test_Recall": recall_test,
"Train_Precision": precision_train,
"Test_Precision": precision_test,
"Train_F1_Score": F1_score_train,
"Test_F1_Score": F1_score_test,
}
)
comparison_frame2
| Model | Train_Accuracy | Test_Accuracy | Train_Recall | Test_Recall | Train_Precision | Test_Precision | Train_F1_Score | Test_F1_Score | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | AdaBoost - default parameters | 0.849 | 0.847 | 0.331 | 0.322 | 0.712 | 0.701 | 0.452 | 0.442 |
| 1 | AdaBoost - Tunned | 0.990 | 0.864 | 0.964 | 0.576 | 0.981 | 0.660 | 0.973 | 0.615 |
| 2 | Gradient Boosting - default parameters | 0.888 | 0.868 | 0.449 | 0.395 | 0.909 | 0.807 | 0.601 | 0.530 |
| 3 | Gradient Boosting - init | 0.886 | 0.866 | 0.452 | 0.384 | 0.890 | 0.797 | 0.599 | 0.518 |
| 4 | Gradient Boosting - Tunned | 0.923 | 0.880 | 0.623 | 0.482 | 0.948 | 0.801 | 0.752 | 0.602 |
| 5 | XGB - deafult parameters | 1.000 | 0.926 | 0.998 | 0.692 | 1.000 | 0.888 | 0.999 | 0.778 |
| 6 | XGB - Tunned | 0.994 | 0.918 | 0.966 | 0.638 | 1.000 | 0.898 | 0.983 | 0.746 |
| 7 | XGB2 - Tunned other parameters | 0.906 | 0.863 | 0.935 | 0.815 | 0.684 | 0.600 | 0.790 | 0.691 |
| 8 | XGB_2 - deafult (No treatment Missing values) | 1.000 | 0.928 | 0.998 | 0.710 | 1.000 | 0.883 | 0.999 | 0.787 |
| 9 | XGB_2 - Tunned (No treatment Missing values) | 0.946 | 0.875 | 0.980 | 0.804 | 0.785 | 0.632 | 0.872 | 0.708 |
| 10 | XGB2_2 - Tunned other parameters (No treatment... | 0.909 | 0.862 | 0.929 | 0.826 | 0.694 | 0.595 | 0.794 | 0.692 |
| 11 | stacking_classifier | 0.918 | 0.882 | 0.630 | 0.489 | 0.908 | 0.808 | 0.744 | 0.609 |
XGB2 - Tunned other parameters and XGB_2 - Tunned (No treatment Missing values) are performing well and not overfitting but XGB2_2 - Tunned other parameters (No treatment Missing values) is performing better on recall.
comparison_frame.append(comparison_frame2, ignore_index=True)
| Model | Train_Accuracy | Test_Accuracy | Train_Recall | Test_Recall | Train_Precision | Test_Precision | Train_F1_Score | Test_F1_Score | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | Decision Tree - default parameters | 1.000 | 0.885 | 1.000 | 0.714 | 1.000 | 0.689 | 1.000 | 0.701 |
| 1 | Decision Tree - Tunned | 0.796 | 0.803 | 0.648 | 0.663 | 0.469 | 0.483 | 0.544 | 0.559 |
| 2 | Bagging classifier - default parameters | 0.994 | 0.905 | 0.969 | 0.587 | 0.998 | 0.866 | 0.983 | 0.700 |
| 3 | Bagging Classifier - weighted | 0.994 | 0.898 | 0.967 | 0.547 | 0.998 | 0.858 | 0.983 | 0.668 |
| 4 | Bagging Classifier - Tunned | 0.716 | 0.714 | 0.666 | 0.703 | 0.362 | 0.365 | 0.469 | 0.480 |
| 5 | Random Forest - deafult parameters | 1.000 | 0.915 | 1.000 | 0.583 | 1.000 | 0.942 | 1.000 | 0.720 |
| 6 | Random Forest - weighted | 1.000 | 0.911 | 1.000 | 0.547 | 1.000 | 0.962 | 1.000 | 0.697 |
| 7 | Random Forest - Tunned | 0.903 | 0.869 | 0.848 | 0.674 | 0.699 | 0.646 | 0.766 | 0.660 |
| 8 | AdaBoost - default parameters | 0.849 | 0.847 | 0.331 | 0.322 | 0.712 | 0.701 | 0.452 | 0.442 |
| 9 | AdaBoost - Tunned | 0.990 | 0.864 | 0.964 | 0.576 | 0.981 | 0.660 | 0.973 | 0.615 |
| 10 | Gradient Boosting - default parameters | 0.888 | 0.868 | 0.449 | 0.395 | 0.909 | 0.807 | 0.601 | 0.530 |
| 11 | Gradient Boosting - init | 0.886 | 0.866 | 0.452 | 0.384 | 0.890 | 0.797 | 0.599 | 0.518 |
| 12 | Gradient Boosting - Tunned | 0.923 | 0.880 | 0.623 | 0.482 | 0.948 | 0.801 | 0.752 | 0.602 |
| 13 | XGB - deafult parameters | 1.000 | 0.926 | 0.998 | 0.692 | 1.000 | 0.888 | 0.999 | 0.778 |
| 14 | XGB - Tunned | 0.994 | 0.918 | 0.966 | 0.638 | 1.000 | 0.898 | 0.983 | 0.746 |
| 15 | XGB2 - Tunned other parameters | 0.906 | 0.863 | 0.935 | 0.815 | 0.684 | 0.600 | 0.790 | 0.691 |
| 16 | XGB_2 - deafult (No treatment Missing values) | 1.000 | 0.928 | 0.998 | 0.710 | 1.000 | 0.883 | 0.999 | 0.787 |
| 17 | XGB_2 - Tunned (No treatment Missing values) | 0.946 | 0.875 | 0.980 | 0.804 | 0.785 | 0.632 | 0.872 | 0.708 |
| 18 | XGB2_2 - Tunned other parameters (No treatment... | 0.909 | 0.862 | 0.929 | 0.826 | 0.694 | 0.595 | 0.794 | 0.692 |
| 19 | stacking_classifier | 0.918 | 0.882 | 0.630 | 0.489 | 0.908 | 0.808 | 0.744 | 0.609 |
Overall we can see that the XGBoost performs better on this dataset then others models.
Considering that XGboost treats missing values, we can see that the model applyed on data set without missing values treatment is the one performing better.
We testes 2 differents parameter on them and the second one come out performing a litte bit better on Recal and reducing even more the overfitting.
XGB2_2 - Tunned other parameters (No treatment missing values) is our best model, it reduced the overfitting and is performing better compare to the others models. Performance on Precision and F1_score is not performing that well. It still need improvement considering that is performing 0.826 on test recall and 0.595 on precission.
We have been able to build a predictive model:
a) that the company can deploy to identify customers who will be interested in buy a travel package.
b) that the company can use to find the key factors that will have an impact on converting a custumer as a buyer.
Factors that have an impact: Passport, Designation_Executive, Designation_VP and Marital Status_Single.
Customer with Passport is more likehood to buy it.
If customer Designation is Executive higher are the chances for them to buy a travel package.
Customer with Marital Status Single or Unmaried are also more likehood to buy the travel package.
Model improvement can be done with more Data points, more informations about the customer, more data points to compare patterns and make better predictions.
We should understand the needs of each profile and provide the best travel package thinking in all points that will made the difference and Increase the customer satisfaction.
Create strategies to best use our 5 to 10 time of pitch with our customer, considering this is our mean time spend with them before they say yes or no.
Third and Fourth follow-ups is the KEY to convert this customer as a buyer, to do so we need to train our team to do the best presentation of the travel package and identify customer necessities.
A new travel package BUSINESS should be considering, to target companies that have their employees traveling a lot.
King and Super Delux need to be reviewed and improved to target more buyers.